PhotoSauce Blog


*Note: If you’re just here for the profiles, I have published those in a new github repo. Get them all here.

Thanks to some much-needed vacation time, it’s taken me a while to get to this final part of the series, but now it’s time to put everything together and get some profiles finalized. In the first three parts of this series, I examined ways to pack an ICC v2 profile as small as possible, an approach for finding an ideal point-based TRC fit with the minimum size, and how to derive the correct color primaries and whitepoint for an sRGB-compliant profile. In this final part, I will assemble some profiles using those techniques/values and test them out. I had difficulty devising real-world tests that would demonstrate the differences between profiles, but I think I’ve finally nailed down some good approximations that are fair and realistic.

My initial test plan was simply to re-create the worst case scenario for profile conversion. If a profile performs acceptably in the worst case, it should do even better under less extreme circumstances. For this reason, I decided to focus on conversions from sRGB to ProPhoto RGB. The thinking behind this is that an embedded sRGB profile will be used to convert to other colorspaces, and the colorspace that is the most different from sRGB would be the worst case. It would be possible to construct a custom colorspace that would be even more different than ProPhoto, but that wouldn’t be realistic. ProPhoto is a real colorspace that people actually use, and it has both a gamut that is much, much larger than sRGB and a response curve that is quite different (reference gamma 1.8 vs 2.2). An even more common scenario might be something like sRGB to Adobe RGB or Rec. 2020, but again, if a profile does well with ProPhoto, the others will work even better.

The Reference Image

Having settled on an evaluation strategy, I needed to pick some test images. This turned out to be more difficult than I anticipated. I originally selected a few real-world images that had extremely saturated colors and a few with lots of different shades of blue and green. These are areas where ProPhoto and sRGB would have maximum differences, and that should highlight any errors. Unfortunately, I found it was impossible to compare fairly with real-world images for two main reasons:

  1. No real-world image covers the entire color gamut of sRGB, so an error might not show up simply because the color value that would show the error isn’t present in the image.
  2. Real-world images tend to have areas of repeated pixel values. This means that if one profile causes a specific color to have an error, and if that color is over-represented in the image, it amplifies the error measured from the profile.

For those reasons, I settled on testing with a single reference image. That image comes from Bruce Lindbloom’s site and is a 16.7megapixel generated image that simply contains every color combination possible with 8-bit RGB. The image consists of 256 squares, each with a different blue value. And each of those squares consists of 256 rows and 256 columns, where the red value increases in each column and the green value increases in each row. I found this image makes it easy to see exactly where the errors are focused.

The Reference Profile

The second problem I had was establishing a reference to compare to. In testing my tone reproduction curves, I tested each candidate curve against the true sRGB inverse gamma curve. For the final profile testing, however, I wanted to test real images with real profiles using a real CMS. So I needed a real ICC profile to serve as a reference. Unfortunately, as we discovered in Part 3 of this series, there aren’t any profiles I could find anywhere that are truly sRGB-compliant. Nor could I use the standard 1024-point TRC as a reference, because one thing I want to evaluate is whether the 182- and 212-point curves I found in Part 2 might actually be better than the 1024-point curve used in most profiles.

This series is focused on creating v2 ICC profiles, but v4 profiles have a newer feature that allows the TRC to be defined as a parametric curve rather than a point-based curve with linear interpolation. The parametric curve type allows the sRGB gamma function to be duplicated rather than approximated. Software support for v4 profiles is not great, so they aren’t used frequently, but a v4 profile with a parametric curve would serve as a good reference for testing my v2 profiles. That left me with a new problem, which was to find an optimal v4 profile.

Although the parametric curve type can duplicate the sRGB curve’s basic logic, the parameters themselves are defined in the ICC s15Fixed16Number format, meaning they have limited precision. I decided to evaluate the accuracy of a v4 sRGB curve using the same measures I used to evaluate my point-based curves in order to see how close it was to the true sRGB curve. Once again, I started with an example from Elle’s profile collection.

Here are the stats from that profile’s TRC compared with the best-performing point-based curves from Part 2.

Points | Max Error | Mean Error | RMS Error | Max DeltaL | Mean DeltaL | RMS DeltaL | Max RT Error
   182 |  0.001022 |   0.000092 |  0.000230 |   0.003107 |    0.000440 |   0.000736 | 0
   212 |  0.001650 |   0.000118 |  0.000357 |   0.002817 |    0.000449 |   0.000707 | 0
  1024 |  0.008405 |   0.000205 |  0.000996 |   0.003993 |    0.000475 |   0.000819 | 0
  4096 |  0.008405 |   0.000175 |  0.000860 |   0.003054 |    0.000472 |   0.000782 | 0
    v4 |  0.000177 |   0.000034 |  0.000051 |   0.000564 |    0.000317 |   0.000371 | 0

As you can see, the v4 parametric curve results in significantly less error than even the best point-based options. Its error, however, is still surprisingly high. Let’s take a look at the parameter values from that profile and see why that is.

Param | sRGB Value     | sRGB Decimal   | Profile Hex | Profile Decimal | Diff
    g | 2.4            | 2.4            |  0x00026666 |  2.399993896484 | -6.103516e-6
    a | 1.000/1.055    | 0.947867298578 |  0x0000f2a7 |  0.947860717773 | -6.580805e-6
    b | 0.055/1.055    | 0.052132701422 |  0x00000d59 |  0.052139282227 |  6.580805e-6
    c | 1.000/12.92    | 0.077399380805 |  0x000013d0 |  0.077392578125 | -6.802680e-6
    d | 0.04045        | 0.04045        |  0x00000a5b |  0.040451049805 |  1.049805e-6

Once quantized to s15Fixed16Number format, none of the numbers stored in the profile are exactly correct, and two of the parameters that have the largest impact on the output value (g and a) are both rounded down.  Rounding both values in the same direction effectively combines their error. I decided to try ‘nudging’ all the parameter values to try to find a better fit than was produced by simple rounding. It turned out, the best fit I was able to achieve was by bumping the ‘g’ value up and leaving all the rest as they were. By using a ‘g’ value of 0x00026669, or 2.400039672852, I was able to cut the error to less than half that of the rounded values.

Points | Max Error | Mean Error | RMS Error | Max DeltaL | Mean DeltaL | RMS DeltaL | Max RT Error
    v4 |  0.000177 |   0.000034 |  0.000051 |   0.000564 |    0.000317 |   0.000371 | 0
   ^v4 |  0.000088 |   0.000012 |  0.000022 |   0.000240 |    0.000124 |   0.000143 | 0

While it’s not perfect, that is as close as it’s possible to get to the true sRGB inverse gamma function in an ICC profile. So with that and the primary colorant values from Part 3, I had my reference profile. I decided while I was making a reference profile, I may as well make it as small as I could so that I would have another compact profile option for embedding. That profile is here.

I also decided to create a reference v4 ICC profile for ProPhoto to use as my destination profile. That one was much simpler in that the default rounded values worked out to be the closest fit for the TRC, and the colorant values have a single, unambiguous definition.  That profile is here.

The Reference CMS(s)

Once again, this turned out to be more complicated than I anticipated.  One might expect that with the detail in the ICC specifications, there wouldn’t be much difference between CMS implementations. I’ve worked predominately with the Windows Color System (WCS) by way of the Windows Imaging Component (WIC), and I always assumed it did a reasonable job at color conversions. However, when looking for an easy way to test conversions using multiple profiles, I stumbled on the tifficc command-line utility from Little CMS.

In testing with tifficc, I found that the results were mostly in line with my expectations, but when using my candidate profiles as a target rather than a source, it appeared Little CMS was doing an upgrade or substitution of the specified profile to an internal reference version of the colorspace. That’s definitely a desirable behavior in that it ensures correct output regardless of minor profile differences, but it’s not desirable when trying to measure those minor differences. WCS, on the other hand, produced output different for each profile, and entirely different from Little CMS. And while that output was more in line with my expectations from my previous testing, it seems that it might not be as correct.

I had been planning for some time to replace some of my dependencies on WCS with my own color management implementation, but this has provided the final push I needed to motivate me to get it done. In the end, I decided to consider the output from both CMSs, so it would be easier to predict what might happen when using the profiles in different scenarios and with different software.

The Reference Scenario

This is where things finally get easy. The purpose of a compact profile is to be embedded in an image. Obviously, it would only be embedded in an image of a matching colorspace and would only be used as a source profile in those cases. I had already chosen ProPhoto as a destination colorspace because of its extreme difference from sRGB while still being a realistic conversion path. And having already decided to use a v4 ProPhoto profile as a reference destination, that left only one choice to make. I had already decided that I would test with an 8-bit reference input image because that’s the most common image type in the wild. But wide-gamut colorspaces like ProPhoto are not well-suited for use with 8-bit images. Squishing the sRGB gamut down into its corresponding place in the ProPhoto gamut at 8-bit resolution tends to cause posterization. So I decided to test 8-bit sRGB input and 16-bit ProPhoto output. I was also able to test the reverse of that transform, going from the 16-bit ProPhoto images back to 8-bit sRGB. In the interest of time, I won’t document the full details of those tests, but those tests are the ones that led to my conclusion that Little CMS does some kind of profile substitution and that WCS is probably Not Very Good. I may do another post on that at some point in the future.

For the Little CMS trials, I used a command-line similar to the following:

tifficc -v -t1 -isrgb-v4-ref.icc -oprophoto-v4-ref.icc -e -w rgb16million.tif rgb16milpp-ref.tif

For the WCS trials, I wrote a small utility that uses the WIC IWICColorTransform interface to perform the same conversion.

The Measurements

Having established a reference scenario and created a reference profile, all I had to do was run the conversion(s) in question using the reference profile as well as all the v2 candidates and then compare their output. I also figured it would be worthwhile to try some variants using more common values, like the 1024- and 4096-point TRCs and the ArgyllCMS and HP colorant values. That should allow a complete picture of how the candidate profiles perform as well as a good basis for understanding which parts of the profile contribute to greater differences in output.

Measuring the differences between the profile output mathematically is a simple task, but I wanted to be able to visualize those differences for easier comparison and so that it would be possible to see not only how much difference there was, but also where the differences occurred. I considered using ΔE-CIE2000 for these comparisons, but the reality is, the results are so close visually that there isn’t much meaning to the color difference. I also found the results of the raw differences interesting because of the way the visualization shows patterns in the error.

I’ve referenced the Beyond Compare image comparison tool a few times before because I like the way it works and the way it shows differences. The only problem with using it for these tests is that while it does load 16-bit images, it seems to convert them to 8-bit before doing the comparison. That means I couldn’t get the kind of detail I wanted to see in the diffs. Normally, when I use that tool, I set it up with a threshold of 1, meaning its default visualization will show pixels that are equal between two images in greyscale, pixels that are off by 1/255 in blue, and pixels that are off by more than 1/255 in red. In doing some trials with 8-bit output and comparing them in Beyond Compare, I found that none of the profiles in my test suite created output that differed from the reference by more than 1 on any given pixel. That’s good news, in that it backs up the theory that none of the profiles I’m testing will produce output that is significantly different visually. But it would make it difficult to draw any conclusions about which profiles are better, especially when the differences get more subtle. That issue, combined with the fact that ProPhoto isn’t recommended for 8-bit images anyway, led me to create my own variant of the image comparison tool that worked at higher bit-depth.

The second problem was visualizing the differences. As I said, I like the way Beyond Compare does it, but when you compare 16-bit images, it’s difficult to find a threshold for color-coding the differences. I ended up with something like the original but enhanced for the extra sample resolution. Instead of coloring different pixels either solid blue or solid red depending on the threshold, I created a gradient from blue to red. I chose the threshold rather arbitrarily, setting it at 65/65535, or roughly 0.1%. That threshold worked out well in that allowed me to create a gradient from blue to red for differences between 1 and 65, and then differences over 65 could be colored solid red. Note that the solid red doesn’t necessarily mean a difference would be distinguishable visually. And as you’ll see, differences that great were very rare in the tests anyway.

And finally, I added some stats to the diff images to provide a little more detail than can be seen visually. I grouped the errors into four buckets (1-17,18-33,34-49,50-65) and added raw pixel counts for each error bucket, plus the count of pixels over the 65 threshold. I also calculated the Max, Mean, and Root Mean Square error for each test image versus the reference. Those stats are written into the upper-left corner of each diff image.

The Results

From here on out, there will be a lot of images. These are the diff images created by the tool I described above. Again, grey pixels in the image indicate that the candidate profile produced output identical to the reference profile. Pixels that are tinted blue represent the smallest differences, and there is a gradient from blue to purple to red (more pink, really) for increasing error levels. Finally, any pixels colored solid red were different by more than 65/65535. All the images below are thumbnails, and you can click them to get the full diff image. Be aware, though, the diff images are 16.7megapixels in size, so don’t click them if you’re on a device that can’t handle the size (bytes or pixels). Oh, and the diff images themselves are 8-bit, even though they represent differences between 16-bit images. Since the diff is just a visualization, I wanted to keep them as small as possible. They’re already as much as 10MiB each saved as 8-bit-per-channel PNGs.

For each profile, I’ll include the results from both the LCMS tifficc utility and my WCS/WIC conversion utility. The differences are interesting to see.

The Colorant Factor

I’ll start with the effect of different primary colorant values used in common sRGB profiles. In Part 3 of this series, I looked at the differences between the odd colorant values from the HP/Microsoft sRGB profile as well as the Rec. 709-derived colorants used by the ArgyllCMS reference sRGB profile. For these tests, I created v4 ICC profiles using the same modified parametric curve from my reference profile, so that only the colorants are different. Converting the reference image to ProPhoto using those as source profiles, these are the diffs compared with output from my reference sRGB profile.

CMSLCMSDiff Counts
TRCv4 Ref18-330
Max Diff1634-490
Mean Diff5.464350-650
RMS Diff5.8626>650

CMSLCMSDiff Counts
ColorsRec. 7091-1713.6M
TRCv4 Ref18-330
Max Diff1634-490
Mean Diff1.741250-650
RMS Diff2.2036>650

And the same using WCS

CMSWCSDiff Counts
TRCv4 Ref18-33950745
Max Diff6334-4917799
Mean Diff6.357050-651955
RMS Diff8.4811>650

CMSWCSDiff Counts
ColorsRec. 7091-178.6M
TRCv4 Ref18-33928751
Max Diff6334-4917484
Mean Diff4.712950-651955
RMS Diff7.5958>650

The thing that really stands out to me is the difference in the way these profiles are handled by LCMS and WCS. Based on the splotchiness of the WCS diff images (you’ll have to view them full-size to see), my guess is that it’s using lower-precision calculations than LCMS. In both cases, though, the differences are quite small and should be below the threshold of visible difference. That’s certainly the case with the Rec. 709 colors vs the sRGB reference colors, but the unbalanced colors from the HP/Microsoft profile don’t result in as much difference in the converted result as one might expect. I think the differences here also make a good reference point for determining the significance of the differences caused by different TRC approximations.

The 26-Point Curves

In Part 2 of this series, I did some detailed analysis of both the TinyRGB/c2 26-point approximated TRC and the proposed ‘improved’ curve used in the sRGBz profile. That analysis predicted that the sRGBz curve would perform less well than the TinyRGB curve, and it found another alternate 26-point curve that it predicted would do better. I figured some real-world testing of those predictions would be a good start. Although I measured and tuned the curves primarily using ΔL, which is a measure of visual difference, we can see that the results are the same even when measuring absolute pixel differences after conversion.

Note that in testing these curves, I created new profiles that all shared the same reference sRGB colorant values to limit any differences to the curves themselves.

It’s difficult to see in the thumbnails, but at full size, the visualization shows pronounced improvement in error levels between my alternate 26-point curve and either of the others. The sRGBz curve has both the largest mean error and the most individual pixels with high error levels.

CMSLCMSDiff Counts
ColorssRGB Ref1-1713.6M
Max Diff6734-49243867
Mean Diff13.647150-6517624
RMS Diff15.8336>6559

CMSLCMSDiff Counts
ColorssRGB Ref1-1714.6M
Max Diff7234-49143994
Mean Diff13.553350-6513082
RMS Diff15.1258>651

CMSLCMSDiff Counts
ColorssRGB Ref1-1714.7M
TRC26-Point Alt18-332M
Max Diff6434-49111588
Mean Diff13.406950-653636
RMS Diff14.8904>650

And again with WCS

CMSWCSDiff Counts
ColorssRGB Ref1-1711.9M
Max Diff6934-49487042
Mean Diff15.661150-6551775
RMS Diff18.7551>65335

CMSWCSDiff Counts
ColorssRGB Ref1-1713.1M
Max Diff6234-49269412
Mean Diff13.874650-6513434
RMS Diff16.2264>650

CMSWCSDiff Counts
ColorssRGB Ref1-1713.2M
TRC26-Point Alt18-333.3M
Max Diff5434-49275720
Mean Diff13.911250-655374
RMS Diff16.2414>650

Once again, the output from WCS seems to have amplified the error in the profiles, but the relative results are the same. The sRGBz curve is less accurate than TinyRGB’s, which is less accurate than my alternate 26-point curve. It’s also worth noting how much more error these curves contribute compared to the error from the alternate primary colorants. This level of error is still quite acceptable for the profiles’ primary intended use-case, but we’ll look at some other options.

The Alternate Compact Curves

I picked out a few of the interesting compact curves that my solver found in Part 2 of this series to see how they compare in terms of size/accuracy ratio. Here are those comparisons, again using both CMS’s. First LCMS…

CMSLCMSDiff Counts
ColorssRGB Ref1-178.0M
Max Diff8234-491.1M
Mean Diff23.141450-65212433
RMS Diff25.6127>6518122

CMSLCMSDiff Counts
ColorssRGB Ref1-1716.5M
Max Diff4834-4912
Mean Diff8.749550-650
RMS Diff9.6480>650

CMSLCMSDiff Counts
ColorssRGB Ref1-1716.7M
Max Diff3234-490
Mean Diff5.139850-650
RMS Diff5.7177>650

CMSLCMSDiff Counts
ColorssRGB Ref1-1716.1M
Max Diff3234-490
Mean Diff2.459250-650
RMS Diff2.8069>650

And once more using WCS

CMSWCSDiff Counts
ColorssRGB Ref1-178.1M
Max Diff8234-491.5M
Mean Diff23.043950-65227729
RMS Diff26.0615>6534926

CMSWCSDiff Counts
ColorssRGB Ref1-1715.6M
Max Diff3834-491162
Mean Diff9.666250-650
RMS Diff11.4510>650

CMSWCSDiff Counts
ColorssRGB Ref1-1716.6M
Max Diff2534-490
Mean Diff7.396250-650
RMS Diff8.4588>650

CMSWCSDiff Counts
ColorssRGB Ref1-1716.7M
Max Diff2334-490
Mean Diff6.725050-650
RMS Diff7.3211>650

And now a few notes on these interesting curves…

Although the 20-point curve diff images look like a bit of a bloodbath, allow me to point out a couple of things. First, as I mentioned before, I chose the threshold for the full red pixels rather arbitrarily. I wanted the small differences between all the profile variants to be visible even in the thumbnails here, and I chose my thresholds based on my choice of a 64-value gradient for the smaller errors. Red doesn’t necessarily mean danger in this case; it just means the error is higher than the worst from the other curves. Not a lot worse, mind you, but the line has to be drawn somewhere, and I just happen to have drawn that line just under the max error of the 20-point curve. Second, you’ll note that most of the worst error is concentrated toward the upper left of the image and the upper left of each square within the image. These are the darker parts of the image, where a larger absolute pixel difference represents a smaller visual difference than it would at the mid-tones. The choice to concentrate the error in those areas less visible was a key part of the tuning algorithm used in my curve solver. I believe the 20-point curve is perfectly adequate for some 8-bit image embedding, particularly for thumbnail-sized images where file size is important. Weighing in at only 410 bytes, I believe this is the smallest possible usable sRGB-compatible profile.

The other three candidate curves performed very well indeed. The 32-point curve is quite a significant improvement over the 26-point curves used in the existing compact profiles and with a cost of only 12 additional bytes in the profile. So once again, I’ll say that 26 is not a magic number in this case. But really, if you’re looking for a magic number, wouldn’t you just skip straight to 42? The error level in the 42-point curve is quite good. It’s actually awfully close to the error caused by the bad colorant values used in the very popular HP/Microsoft sRGB profile, so it makes an excellent compromise if you’re looking to save space. The 63-point curve halved the error of the 42-point curve when using LCMS but didn’t do as much better with WCS, so while it would also be a good choice, I think 42 is the magic number for my compact profile.

The Big Curves

That just leaves us with the larger curves to evaluate. My solver identified curves of 182 and 212 points that appeared to be a closer fit to true sRGB than the standard 1024- and 4096-point curves used in many profiles. I wanted to see if that was true in a real-world test. Here are the results when using all four of those.

CMSLCMSDiff Counts
ColorssRGB Ref1-177.6M
Max Diff1634-490
Mean Diff0.743650-650
RMS Diff1.2585>650

CMSLCMSDiff Counts
ColorssRGB Ref1-178.2M
Max Diff1634-490
Mean Diff0.773650-650
RMS Diff1.2533>650

CMSLCMSDiff Counts
ColorssRGB Ref1-178.4M
Max Diff1634-490
Mean Diff0.787950-650
RMS Diff1.2471>650

CMSLCMSDiff Counts
ColorssRGB Ref1-172.8M
Max Diff1634-490
Mean Diff0.216450-650
RMS Diff0.5570>650

And repeated one last time using WCS

CMSWCSDiff Counts
ColorssRGB Ref1-1716.6M
Max Diff2334-490
Mean Diff5.784950-650
RMS Diff6.2883>650

CMSWCSDiff Counts
ColorssRGB Ref1-1716.6M
Max Diff2334-490
Mean Diff5.651250-650
RMS Diff6.1638>650

CMSWCSDiff Counts
ColorssRGB Ref1-1716.6M
Max Diff1734-490
Mean Diff5.663450-650
RMS Diff6.1857>650

CMSWCSDiff Counts
ColorssRGB Ref1-1716.6M
Max Diff1734-490
Mean Diff5.600650-650
RMS Diff6.1383>650

I must admit, these results have left me a bit puzzled. With LCMS, the 182- and 212-point curves generally outperformed the 1024-point curve, as I expected. But the 4096-point curve really blew the others away. In doing some round-trip testing early on with LCMS, I found that it was producing perfect output with the 1024- and 4096-point TRCs when it really shouldn’t have, so I suspected there may be some sort of internal substitution happening with those larger TRCs. I tested that theory out by modifying the first 20 points of the 1024-point profile to use the same large value. The output didn’t change, which lends some support to that theory. I didn’t dig into the LCMS code to see what’s happening, but I can say that the same substitution/upgrade did not occur when using my 182- or 212-point TRCs. So if you’re using LCMS and want the most accuracy (for destination profiles at least), you may be better off sticking with the standard TRCs. When used as a source profile, however, there is something special about those smaller curves. I think they’ll work nicely for embedded profiles when size is a factor.

The results when using WCS to convert were a bit more in line with my expectations. The 4096-point curve had more pixels with larger individual errors compared to the 1024-point curve, but it made up some of the gap in the average error by better fitting the top part of the curve. The 182- and 212-point TRCs performed admirably but didn’t offer an upgrade over the larger versions. Again, they have almost the same accuracy as the standard curves, so if size is a concern, they’re a viable option. I’ll go ahead and publish a profile that uses the 212-point curve because I think it has some value, but it’s not quite the upgrade over the standard curves I thought it might be.

The CMS Factor

I found the differences between the results with the two CMS’s interesting enough to do all the tests in both, but I wanted to show one last comparison to lend a bit of perspective to all the other comparisons/diffs in this post. Here’s what it looks like when you compare the reference outputs from each CMS with each other.

ColorssRGB Ref1-1711.1M
TRCv4 Ref18-332.9M
Max Diff40434-491.0M
Mean Diff22.793650-65563030
RMS Diff35.1353>651.1M

Now that’s a bloodbath. And that serves to make an important point: the CMS implementation details can easily have a far greater impact on the output results than any of the profile aspects we’ve looked at. I’m not quite sure which is more accurate between LCMS and WCS, but I strongly suspect it’s the former. I’ll do some more testing on that as I do my own color conversion implementations in MagicScaler. If I find anything interesting and if I remember, I’ll come back and update this post.

Post comment