Making a Minimal sRGB ICC Profile Part 2: Curve the Curves
In Part 1 of this series, I examined Facebook’s TinyRGB (c2) ICC profile, following on from the work Øyvind Kolås (Pippin) did in creating his sRGBz profile. I was able to trim an extra 68 bytes off that profile (making 100 bytes total reduction off TinyRGB) by careful packing of the data, and now I turn my attention to the tone reproduction curve (TRC) tags and their shared content.
In his sRGBz post, Pippin discusses the Facebook decision to use 26 points in their tone reproduction curve. The Facebook post explains that this was done because the linear part of the sRGB curve ends about 1/25th of the way in, making that a natural place for the second TRC point to fall. In fact, the sRGB curve is defined as having a linear segment up to a value of precisely 0.04045, which is awfully close to 1/25. That makes a sensible place to start testing, but it seems they decided that was the magic number and went full speed ahead without bothering to check others.
The tricky thing about optimizing a point-based curve approximation for an ICC profile is that the curve points have to be spaced at even intervals. If we were allowed to space them arbitrarily, we could define the linear segment precisely with two points and then use as many points as we wanted to tune the curvy part of the curve. But with even spacing, options are much more limited, and the performance of curves with different numbers of points defined can be quite unpredictable. It makes sense, then, that the Facebook team would choose 26 as a starting point.
However, Pippin failed to find any compelling evidence that 26 is disproportionately better than other surrounding numbers. My check of their math results in the same conclusion, but I arrived at it in a different way, which I’ll be getting to. 26 points produce a decent curve, but in that size range, more is better and fewer is not necessarily a lot worse. What’s nice about a 26-point curve is that at 2 bytes per point, plus the 12-byte header, the curve is a nice even 64 bytes. And that’s about the only special thing it has going for it.
In Search of Magic Numbers
Is Facebook’s curve the best curve you can get with 26 points? And if 26 isn’t the magic number, is there one?
I was intrigued by Pippin’s alternate proposed curves, so I set out to do some testing of my own using his as a starting point. One thing that stood out to me immediately was that he optimized the curves for minimum mean absolute error. Generally, when testing sample fit to a curve, root-mean-square error is more meaningful, because it gives more weight to points that are further off the curve. Large individual errors are definitely undesirable in this case, so that seems a better measure. I was also interested in seeing the max error for that reason. I set up some code to interpolate the 256 values that would be found in an 8-bit JPEG’s color channels, compared them to the values calculated using the actual sRGB inverse gamma function, and measured the max error, MAE, and RMSE for his curves vs the TinyRGB/c2 curve.
Points | Max Error | Mean Error | RMS Error | Point Values 23 | 0.000587 | 0.000148 | 0.000194 | 0,229,544,1072,1796,2744,3937,5384,7104,9104,11396,13995,16912,20157,23735,27657,31937,36573,41589,46976,52754,58916,65535 24 | 0.000675 | 0.000136 | 0.000180 | 0,219,509,993,1655,2521,3605,4920,6476,8288,10364,12716,15353,18283,21517,25062,28924,33115,37636,42500,47710,53277,59193,65535 25 | 0.000544 | 0.000125 | 0.000166 | 0,210,483,924,1533,2322,3315,4513,5928,7581,9468,11605,14003,16660,19597,22813,26312,30116,34214,38621,43348,48385,53766,59452,65535 *26 | 0.000449 | 0.000119 | 0.000146 | 0,202,455,864,1423,2154,3060,4156,5454,6960,8689,10637,12821,15247,17920,20855,24042,27501,31233,35247,39549,44132,49018,54208,59695,65535 26 | 0.000464 | 0.000115 | 0.000150 | 0,203,457,867,1426,2155,3062,4159,5457,6964,8689,10640,12824,15250,17925,20855,24045,27504,31237,35259,39548,44137,49021,54211,59696,65535 27 | 0.000483 | 0.000106 | 0.000138 | 0,194,429,812,1327,2001,2836,3842,5035,6415,8000,9786,11785,14005,16451,19134,22051,25211,28621,32289,36215,40409,44869,49603,54621,59912,65535 28 | 0.000408 | 0.000098 | 0.000129 | 0,186,410,763,1243,1865,2635,3567,4662,5938,7388,9034,10870,12910,15157,17614,20294,23191,26324,29681,33285,37124,41214,45555,50148,55007,60114,65535 29 | 0.000418 | 0.000091 | 0.000122 | 0,180,390,720,1166,1743,2457,3319,4333,5509,6851,8366,10060,11938,14007,16271,18737,21406,24286,27379,30689,34222,37981,41970,46195,50657,55366,60307,65535 42 | 0.000174 | 0.000043 | 0.000056 | 0,123,246,410,627,897,1224,1612,2064,2583,3170,3826,4558,5365,6250,7212,8258,9385,10602,11901,13289,14769,16342,18005,19765,21620,23574,25630,27778,30038,32395,34859,37431,40105,42891,45785,48794,51909,55140,58486,61945,65535
My results didn’t match his mean error numbers in the 6th decimal place, but they’re close enough that I can tell we’re using the same basic logic. As you can see, the Facebook curve stats (marked with an asterisk) do show a larger mean error, but the max error and RMSE are lower, meaning their curve is a slightly better fit overall based on this measure. Essentially, that curve has a greater overall error, but the error is distributed more evenly with less large individual errors. Their max error is also lower than the curves with more/less points immediately surrounding, which is good, but those curves weren’t optimized to minimize max relative error, so that might not be meaningful.
But actually, these numbers still aren’t the best measure of the curves’ accuracy. Because the sRGB gamma curve is intentionally very much not linear (except for that small bit at the start), a relatively small absolute error at the bottom end has a greater impact on image fidelity than a larger absolute error at the top of the curve. For example, the output value for an input of 1/255 should be 0.000304. An error of 0.000449 (the max error from the TinyRGB curve) on that value would be huge. At the top of the curve, where the output for 254/255 should be 0.991102, that same error would be insignificant. A more useful measure here would be the error relative to the correct value, not the absolute error.
Going beyond that, it’s important to understand what the curve is used for and what an error actually means as far as image fidelity. This curve is included in an ICC profile that’s meant to be embedded in images so that they can be converted to other colorspaces. Since we know the curve is going to have errors, it’s best to optimize the placement of the points so that the error has as little visual impact as possible when the image is converted.
That conversion process goes like this:
- Convert source RGB values to Linear RGB. This is what the curve is used for. It should approximate the inverse gamma function from the sRGB spec. That’s where the errors are introduced – you can’t precisely replicate the sRGB curve with nothing but straight lines.
- Convert Linear RGB to XYZ. This is done using the XYZ values for the red, green, and blue primaries that are also included in the profile.
- Convert those XYZ values to Linear RGB in the target colorspace using its XYZ primaries.
- Run that Linear RGB through the inverse of the curve in the target profile to arrive at the final target RGB values.
The simplest version of this process would be an identity transform from the sRGB-compatible colorspace to true sRGB. If everything goes right, the output values will be identical to the input.
That’s my first criterion for the curve. It must support a round-trip for every value 0-255 through the profile curve and then back through the true sRGB gamma function. If any value changes on round-trip, the curve is not sRGB-compatible.
Measuring Visual Error
The round-trip test is the absolute minimum that the curve should pass, but we can actually get a pretty good idea of the curve’s visual accuracy beyond that. Keep in mind that sRGB is a relatively compact colorspace. When converting to a colorspace with a wider gamut, a difference that might not result in an error in sRGB might throw a color off by quite a lot in a colorspace that is larger and more spread out.
I think Facebook was on the right track with their design. They mentioned validating the error in their curve by using the ΔE-CIE94 measure. That’s a measure of color difference based in the L*a*b* colorspace, which is designed to be perceptually uniform. So instead of measuring numbers from the curve output and just picking the closest ones, they actually verified that the numbers they picked got close visually to the reference values. L*a*b* is calculated directly from XYZ values, so it’s also a good test of the exact conversions that will happen when the profile is used for real.
I got the impression from their post that they tuned the curve first and then used the ΔE-CIE94 measures to make sure the final results were good enough. My plan was to integrate the visual measures into the tuning process itself, so that the results would not just be good enough, but rather would be the best possible for a given number of curve points.
To that end, I decided to take a similar but simpler approach. ΔE-CIE94 is complicated to calculate because it has some refinements to the original ΔE-CIE76 spec to deal with irregularities in the model that show up in certain hue ranges. Furthermore, to test the entire RGB space, I would have to do 16.7M comparisons (at 8-bit input precision) with that complicated calculation for each candidate curve. I realized I could simplify things greatly by working with the grey values 0-255. Since sRGB uses the same curve for all three color channels, grey is as good as any color for testing the curve.
Limiting to just the grey values allows a simpler calculation of L* since it can be directly calculated from the Y value in XYZ, and a* and b* will always be 0. That meant I could look just at ΔL* and have a very good idea what the perceptual difference was between the reference value and the calculated value from the curve candidates. And to make that comparison as accurate as possible, I used the ΔL* adjustments from the even-newer ΔE-CIE2000, which gives more importance to midtones, reducing the visual difference measure for very dark or very light colors.
So, to review, I ended up with three measures for evaluating and tuning the curves. In order of importance, those are:
- The round-trip test through the sRGB gamma function
- The ΔL* for reference vs calculated values
- The relative error in the curve output values
I decided to keep the relative error from the curve output as a measure, because the closer the curve is to the correct sRGB gamma curve numerically, the more points can be interpolated relatively error-free. I’ll explain that more later, but basically, the round-trip test and ΔL* are best for determining the max error and tolerances, but the relative error is best for fitting the curve for points in-between.
With all that explanation out of the way, I’ll get back to the curves from Pippin’s sRGBz post. Here are the stats for those curves using the measures I described. Again, the TinyRGB curve is marked with an asterisk. And the left three error columns are now relative error instead of absolute.
Points | Max Error | Mean Error | RMS Error | Max DeltaL | Mean DeltaL | RMS DeltaL | Max RT Error 23 | 0.039987 | 0.002466 | 0.005752 | 0.189551 | 0.017885 | 0.029098 | 1 24 | 0.040603 | 0.002357 | 0.005641 | 0.179425 | 0.016492 | 0.027182 | 1 25 | 0.031010 | 0.002205 | 0.005345 | 0.124504 | 0.015105 | 0.024044 | 0 *26 | 0.034171 | 0.001978 | 0.005315 | 0.095100 | 0.014204 | 0.021270 | 0 26 | 0.029402 | 0.002035 | 0.005077 | 0.111436 | 0.013970 | 0.022130 | 0 27 | 0.031464 | 0.001920 | 0.004796 | 0.120256 | 0.012911 | 0.020769 | 0 28 | 0.029564 | 0.001918 | 0.004616 | 0.104781 | 0.011936 | 0.018717 | 0 29 | 0.028636 | 0.001729 | 0.004265 | 0.093921 | 0.011154 | 0.017872 | 0 42 | 0.015034 | 0.000887 | 0.002183 | 0.040349 | 0.005297 | 0.008399 | 0
Using these measures, we can learn much more about the real-world usefulness of the curves. First of all, you can see that Pippin’s 23- and 24-point curves, despite having fairly low mean and RMS error values, failed the round-trip test. The Max RT Error of 1 means the pixels were offset from their correct values by a max of 1, but that’s still not good enough. Next, you can see that the Max ΔL* from the TinyRGB curve is lower than all but the two largest of Pippin’s proposed curves. Looking at the columns on the left, you can see that Pippin’s 26-point curve is a better fit to the reference curve based purely on the relative error numbers, and that makes sense given that that’s how he optimized them. He looked only at the raw numbers, while the Facebook team considered the visual impact of the numbers.
So based on that, the TinyRGB curve looks pretty impressive. It passes the round-trip test and was obviously tuned for visual accuracy. But can we do better? Of course we can :)
But first, I’ll explain one more thing. What does the ΔL* value mean in real-world terms?
The Facebook TinyRGB post said that their ΔE-CIE94 testing showed that their error level was less than half of what is perceptible to humans. Under the CIE76 definition of ΔE, a value of 1 is generally considered the minimal noticeable difference between colors, and ΔE is defined as Sqrt(ΔL*2 + Δa*2 + Δb*2). If we were to assume a target ΔE of 1, then knowing that our Δa* and Δb* values are always 0, we could say that the minimal noticeable ΔL* should be Sqrt(1/3), or 0.57735. However, the newer revisions to ΔE complicate things by adding a scaling factor to each color component, and ΔE-CIE2000 complicates things a bit more by adjusting the color difference so that midtones are more heavily weighted. That makes it more difficult to find a threshold value for ΔL*. I decided to do some ad-hoc testing using real grey values from the real sRGB to lend context. I calculated the minimum and maximum ΔL* for all adjacent shades of grey in 8-bit sRGB. The minimum value was 0.157124, which was the difference between grey levels 0 and 1. The max was 0.397609, between grey levels 117 and 118.
Looking back at the curves that failed the round-trip test, you can see those had max ΔL* values of 0.179425 and 0.189551, so it’s easy to imagine why they would have had values change on the round-trip. To make it easier to picture the difference, though, here’s what those greys look like. First a pair of boxes at grey values 0 and 1:
And now a pair at 117 and 118:
On my laptop, which has an above-average-quality screen, in a dark room, I can see the line between 117 and 118 quite clearly. The line between 0 and 1, I can’t really see at all. Depending on your screen, viewing environment, and eyes, you may or may not see any difference.
Based on my sample size of one (totally statistically significant – to me, ha), the minimum noticeable difference in ΔL* seems to be somewhere between 0.16 and 0.40… Let’s call it, 0.2-ish to be safe. The max ΔL* of the TinyRGB curve is right around half that, so that checks out. We’re going to do better than that by far, but I wanted to give you an idea what that number means in the real world since it was a key measurement in my testing.
As I mentioned before, I reached the same conclusion Pippin did regarding the magic of the 26-point curve. I did it by testing curves at all sizes from 16-255 and comparing them. The curves were tuned using the same measures I detailed above. The first priority was round-trip accuracy, second was to minimize ΔL*, and third was to fit the curve by minimizing the RMS relative error. This required an iterative approach to curve optimization, where certain points were locked based on their impact to ΔL* and the others were allowed to move until the best-fitting curve was found. My solver found a few interesting ones.
Show Me Those Curves
I’ll start with the smallest useable curves I was able to create.
Points | Max Error | Mean Error | RMS Error | Max DeltaL | Mean DeltaL | RMS DeltaL | Max RT Error | Point Values 19 | 0.041959 | 0.003564 | 0.007399 | 0.139496 | 0.026601 | 0.038015 | 0 | 0,279,753,1521,2622,4077,5920,8169,10853,13987,17596,21693,26300,31431,37102,43328,50128,57494,65535 20 | 0.035090 | 0.003569 | 0.007435 | 0.128757 | 0.024572 | 0.035288 | 0 | 0,263,693,1387,2358,3664,5297,7296,9672,12449,15641,19264,23335,27867,32875,38371,44368,50882,57905,65535 21 | 0.033688 | 0.003200 | 0.006765 | 0.115426 | 0.021854 | 0.031353 | 0 | 0,250,638,1263,2146,3309,4773,6557,8678,11152,13995,17221,20842,24872,29323,34206,39534,45316,51565,58276,65535 22 | 0.033893 | 0.003218 | 0.007011 | 0.108881 | 0.020180 | 0.028803 | 0 | 0,237,594,1159,1959,3008,4325,5928,7832,10050,12598,15485,18727,22331,26312,30677,35438,40603,46183,52189,58613,65535 23 | 0.034801 | 0.002913 | 0.006443 | 0.106786 | 0.018459 | 0.026617 | 0 | 0,227,554,1071,1798,2749,3940,5389,7106,9106,11401,14000,16917,20159,23738,27661,31938,36580,41589,46980,52759,58920,65535 24 | 0.031175 | 0.003083 | 0.007205 | 0.089015 | 0.017114 | 0.024290 | 0 | 0,215,520,994,1657,2523,3607,4922,6479,8291,10369,12721,15358,18288,21521,25065,28928,33116,37639,42503,47714,53282,59201,65535
By only considering options that allowed the round-trip test to pass, I was able to create viable curves with as few as 19 points. You can see that each point added reduces ΔL*, though, so more is better at this stage.
Points | Max Error | Mean Error | RMS Error | Max DeltaL | Mean DeltaL | RMS DeltaL | Max RT Error | Point Values 25 | 0.035425 | 0.002291 | 0.005597 | 0.091829 | 0.015423 | 0.022824 | 0 | 0,210,487,926,1534,2327,3317,4515,5934,7583,9472,11610,14005,16666,19600,22815,26320,30117,34218,38626,43349,48393,53765,59459,65535 *26 | 0.034171 | 0.001978 | 0.005315 | 0.095100 | 0.014204 | 0.021270 | 0 | 0,203,457,867,1426,2155,3062,4159,5457,6964,8689,10640,12824,15250,17925,20855,24045,27504,31237,35259,39548,44137,49021,54211,59696,65535 26 | 0.032400 | 0.002235 | 0.005239 | 0.079740 | 0.014239 | 0.020672 | 0 | 0,201,459,866,1426,2155,3062,4159,5457,6964,8689,10639,12824,15250,17925,20854,24045,27504,31237,35249,39548,44137,49022,54211,59697,65535 27 | 0.026077 | 0.002531 | 0.005867 | 0.069934 | 0.013876 | 0.019952 | 0 | 0,191,435,819,1329,2003,2837,3846,5037,6419,8001,9787,11788,14008,16455,19134,22052,25213,28625,32291,36218,40409,44871,49607,54624,59917,65535 28 | 0.025899 | 0.002487 | 0.006214 | 0.063570 | 0.012585 | 0.017635 | 0 | 0,183,415,765,1245,1867,2638,3568,4666,5938,7392,9036,10873,12913,15159,17618,20296,23195,26325,29685,33286,37127,41217,45557,50152,55009,60120,65535 29 | 0.024357 | 0.002259 | 0.005617 | 0.054125 | 0.011644 | 0.016265 | 0 | 0,177,395,723,1169,1746,2460,3321,4336,5511,6853,8368,10062,11942,14011,16275,18740,21409,24288,27381,30691,34225,37984,41974,46199,50661,55367,60310,65535
As more points are added, the ΔL* continues to go down. I was able to create 24- and 25-point curves with lower max ΔL* than the TinyRGB 26-point curve (marked with an asterisk again) as well as improve on nearly all the stats with a different 26-point curve of my own. But neither is as good as the 27 or 28 or 29, which is to say… there’s nothing special at all about 26 points.
Outside the small blip between 24 and 25 points, It wasn’t until my solver reached 32 points that it wasn’t able to continue improving with each additional point. Beyond that size, reductions in ΔL* got more difficult to come by, and curve performance was more difficult to predict. The sizes that outperform their neighbors make interesting candidates if you’re looking to optimize size/quality ratio, like you might do if you were trying to make a compact sRGB-compatible profile. Here are stats from a few such curves:
Points | Max Error | Mean Error | RMS Error | Max DeltaL | Mean DeltaL | RMS DeltaL | Max RT Error | Point Values 32 | 0.018609 | 0.001701 | 0.004036 | 0.039496 | 0.009391 | 0.013111 | 0 | 0,161,345,618,985,1453,2030,2724,3539,4481,5554,6763,8115,9611,11256,13055,15012,17130,19412,21862,24484,27280,30256,33410,36750,40276,43993,47902,52005,56309,60807,65535 42 | 0.007896 | 0.000696 | 0.001455 | 0.025409 | 0.005082 | 0.007290 | 0 | 0,124,248,412,629,899,1225,1614,2066,2584,3170,3828,4559,5366,6250,7214,8259,9388,10602,11902,13291,14771,16342,18007,19766,21622,23575,25629,27782,30038,32397,34860,37430,40107,42892,45787,48793,51911,55141,58487,61945,65535 56 | 0.007696 | 0.000515 | 0.001255 | 0.013777 | 0.002988 | 0.004177 | 0 | 0,92,183,284,410,566,751,966,1215,1497,1813,2167,2556,2984,3452,3958,4507,5096,5729,6406,7126,7892,8704,9562,10468,11423,12425,13478,14581,15734,16940,18197,19507,20872,22289,23762,25290,26873,28513,30210,31964,33777,35647,37577,39567,41616,43727,45899,48131,50427,52785,55205,57690,60239,62850,65535 63 | 0.003646 | 0.000347 | 0.000720 | 0.009950 | 0.002294 | 0.003162 | 0 | 0,82,163,247,350,475,623,794,990,1212,1459,1734,2038,2370,2732,3124,3547,4002,4489,5009,5562,6150,6772,7430,8124,8853,9620,10424,11266,12146,13065,14024,15022,16061,17140,18261,19422,20627,21873,23162,24495,25871,27292,28757,30267,31821,33422,35069,36762,38501,40288,42123,44005,45935,47914,49941,52018,54144,56321,58547,60824,63151,65535 124 | 0.005790 | 0.000191 | 0.000765 | 0.003821 | 0.000682 | 0.000986 | 0 | 0,41,82,124,165,206,250,300,355,416,482,554,632,716,806,902,1005,1114,1230,1353,1482,1619,1762,1913,2071,2236,2409,2589,2777,2972,3176,3387,3606,3834,4069,4313,4565,4825,5094,5372,5658,5953,6256,6568,6890,7220,7559,7908,8265,8632,9008,9394,9789,10194,10608,11032,11465,11909,12362,12825,13298,13781,14274,14777,15291,15815,16349,16893,17448,18014,18589,19176,19773,20381,21000,21629,22269,22921,23583,24256,24941,25636,26343,27061,27790,28530,29282,30045,30820,31607,32404,33214,34035,34868,35713,36570,37438,38318,39211,40115,41031,41960,42900,43853,44818,45795,46785,47787,48801,49828,50867,51919,52983,54060,55150,56252,57368,58495,59636,60790,61956,63136,64328,65535 182 | 0.001022 | 0.000092 | 0.000230 | 0.003107 | 0.000440 | 0.000736 | 0 | 0,28,56,84,112,140,168,196,225,256,290,326,365,405,449,496,544,597,651,708,769,831,898,966,1038,1113,1191,1273,1356,1444,1534,1628,1726,1825,1930,2036,2147,2261,2377,2499,2623,2751,2882,3017,3156,3297,3444,3593,3746,3904,4064,4229,4397,4570,4746,4926,5110,5298,5489,5686,5885,6090,6297,6510,6726,6946,7171,7399,7632,7869,8110,8356,8606,8860,9119,9381,9649,9920,10197,10477,10762,11051,11345,11644,11946,12254,12566,12882,13204,13529,13860,14195,14534,14880,15228,15583,15941,16304,16673,17046,17424,17807,18194,18587,18984,19387,19793,20206,20623,21045,21472,21904,22341,22784,23230,23684,24140,24603,25071,25543,26022,26505,26993,27487,27985,28490,28998,29514,30033,30558,31089,31624,32166,32712,33264,33822,34384,34953,35525,36105,36689,37279,37875,38475,39083,39694,40312,40935,41563,42198,42838,43483,44135,44791,45455,46122,46796,47476,48161,48853,49549,50252,50960,51674,52395,53119,53852,54589,55332,56082,56836,57598,58364,59137,59916,60700,61492,62288,63091,63899,64714,65535 212 | 0.001650 | 0.000118 | 0.000357 | 0.002817 | 0.000449 | 0.000707 | 0 | 0,24,48,72,96,120,144,168,192,217,243,270,300,332,365,400,437,476,517,560,605,652,701,752,805,861,918,978,1040,1104,1170,1239,1310,1383,1459,1536,1617,1700,1785,1873,1962,2055,2150,2248,2348,2450,2555,2663,2774,2886,3002,3120,3242,3365,3492,3620,3753,3887,4025,4164,4308,4453,4602,4754,4908,5065,5225,5389,5554,5723,5895,6070,6248,6429,6612,6799,6990,7182,7379,7577,7780,7986,8194,8406,8620,8838,9060,9284,9512,9742,9976,10214,10454,10698,10945,11196,11449,11706,11967,12230,12498,12768,13042,13319,13599,13884,14171,14462,14756,15054,15356,15660,15969,16280,16596,16915,17237,17563,17892,18226,18563,18903,19247,19594,19946,20300,20659,21021,21387,21756,22130,22506,22887,23271,23660,24051,24447,24846,25250,25657,26067,26482,26900,27323,27749,28178,28612,29050,29492,29937,30386,30839,31296,31758,32223,32691,33165,33641,34123,34607,35096,35588,36086,36587,37092,37600,38113,38631,39152,39677,40206,40739,41277,41819,42364,42914,43468,44027,44589,45155,45726,46301,46880,47463,48050,48642,49238,49839,50443,51051,51664,52281,52903,53528,54159,54793,55432,56075,56722,57373,58030,58690,59355,60023,60697,61375,62057,62744,63435,64130,64830,65535
You can see that at 32 points, the max ΔL* is less than half that of the TinyRGB/c2 curve, which makes the increase in size well worth it. Doubling(-ish) the size to 63 points reduces error a further ~4x. Past that, it becomes increasingly expensive to make quality gains, with doubling size yielding a ~2.5x error improvement. Beyond that, it takes lots more points to improve accuracy, which peaked in the 212-point curve.
At this point, an obvious question comes up: Why even bother with 212? Why not just use a 256-point curve tag and be done with it?
Bigger Isn’t Always Better
Intuitively, one might expect that the best curve fit for an 8-bit image would have 256 points. Each point could contain the exact best output value for each input and no interpolation would be required. But look what happens when we compare a 256-point curve to the best performers from above.
Points | Max Error | Mean Error | RMS Error | Max DeltaL | Mean DeltaL | RMS DeltaL | Max RT Error 124 | 0.005790 | 0.000191 | 0.000765 | 0.003821 | 0.000682 | 0.000986 | 0 182 | 0.001022 | 0.000092 | 0.000230 | 0.003107 | 0.000440 | 0.000736 | 0 212 | 0.001650 | 0.000118 | 0.000357 | 0.002817 | 0.000449 | 0.000707 | 0 256 | 0.005447 | 0.000210 | 0.000802 | 0.004125 | 0.000646 | 0.001042 | 0
Because the curve points are stored as 16-bit unsigned integer values in the ICC profile (the ICC response16Number type), there’s a natural limit to the output precision. That limit is 1/65535, or 0.0000152902. Remember that at the bottom end of the sRGB curve, the output values are very, very small. For example, the value for an input of 2/255 should be 0.0006070540. Quantized to 16 bits, that value becomes 40/65535, which is actually 0.0006103609. That value is higher than the correct one by 0.5447%, which is the max error shown above. And there are several values with that error – I didn’t just pick the worst one. But notice the 182- and 212-point curves have much lower max errors. The same is reflected in the ΔL*. Although it’s tiny on the 256-point curve, the others still do better. Because those have fewer points, the output values have to be interpolated between two points and can actually fall between the values that would be possible to express explicitly at 16-bit precision. So, in this case, less can be more.
Carrying that further, consider the 1024-point curve used in the standard sRGB profile. Once again, I will reference Elle Stone’s site, which has a detailed survey of a variety of common sRGB profiles. She found that the majority of profiles use that same 1024-point curve. She also explains the precision issue, which she refers to as ‘hexadecimal rounding’. I call it ’16-bit quantization’. Potato, potato.
Let’s see what happens when we use that 1024-point curve to get output for 8-bit input values. And let’s see what happens if we go even bigger and use a 4096-point curve from Elle’s custom profile collection.
Points | Max Error | Mean Error | RMS Error | Max DeltaL | Mean DeltaL | RMS DeltaL | Max RT Error 124 | 0.005790 | 0.000191 | 0.000765 | 0.003821 | 0.000682 | 0.000986 | 0 182 | 0.001022 | 0.000092 | 0.000230 | 0.003107 | 0.000440 | 0.000736 | 0 212 | 0.001650 | 0.000118 | 0.000357 | 0.002817 | 0.000449 | 0.000707 | 0 256 | 0.005447 | 0.000210 | 0.000802 | 0.004125 | 0.000646 | 0.001042 | 0 1024 | 0.008405 | 0.000205 | 0.000996 | 0.003993 | 0.000475 | 0.000819 | 0 4096 | 0.008405 | 0.000175 | 0.000860 | 0.003054 | 0.000472 | 0.000782 | 0
You can see that the max error has actually gotten worse with the bigger curves. The reason for this is that with more points defined in the curve, their values get closer together, and the quantization/rounding error becomes more significant. If we look at the linear segment of the 1024-point curve, we can see the issue.
0,5,10,15,20,25,30,35,40,45,50,55,59,64,69,74,79,84,89,94,99,104,109,114,119,124,129,134,139,144,149,154,159,164,169,174,178,183,188,193
Notice that there’s a nice even increment of 5 between each step… except for two times where it’s 4. That uneven step hints at the fact that the slope of the line allowed by the quantization to 16 bits is not quite right. The only way to make it better is to remove points so that the slope can be represented correctly. Here is the same segment from the 212-point curve, which has even steps throughout.
0,24,48,72,96,120,144,168,192
The extra resolution in the 4096-point curve moves the error around a bit, so it manages a better ΔL* than the 1024-point, but it still trails the 212-point curve in all stats. That curve also has even more serious rounding issues that we haven’t encountered yet, because we’ve only been looking up 256 values in that curve. I’ll come back to that in a bit.
A change of direction
I must admit, I was rather surprised when I learned there were curve matches that exceeded accuracy of the standard 1024-point curve used in so many profiles.
The initial goal I had was to find a better solution than TinyRGB/c2 for a compact sRGB-compatible profile. That profile is used almost exclusively to convert JPEG images to other colorspaces, so the accuracy of its output when used with 8-bit input is the most important thing. For that purpose, the 212-point curve turns out to be the most accurate, and that might make it perfect for image embedding if you don’t mind its size, which comes out to 796 bytes in a minimal profile packed using the technique I described in my last post. That’s about a quarter the size of the standard sRGB profile, with increased accuracy – a true win/win. But there’s a reasonable case to made for a smaller profile as well, especially for thumbnail-sized images. If you have a 4KB JPEG, even 796 bytes for the profile seems heavy. There is, therefore, a need for a smaller profile as well, and I can improve on TinyRGB significantly with just a few more curve points.
I’ll get back to the curves I picked for my compact sRGB-compatible profiles later, but the accuracy of the 182- and 212-point curves got me wondering whether they might also work better as a target profile than the standard sRGB profile does or whether they might be appropriate for higher-bit-depth images. I decided to test them again, using more input samples this time. I discovered that the tuning I had done to optimize for 8-bit input hurt the overall fit of the curves a tiny bit, so they didn’t give quite as good results with more samples. So, I ran them through my solver one more time and asked it to tune for 1024 samples instead of 256. There was a very slight drop in their 8-bit accuracy after that was done, but the curves continued to perform well. And their performance at higher resolution beat everything.
Numbers, Numbers, Numbers
With the final set of interesting curves identified, I set out to do comprehensive comparisons. There are lots of numbers here, so feel free to skip this section if you’re the type whose eyes glaze over when they see too many numbers. Come back for the conclusions and the final profiles, though. They’ll be interesting, I promise.
Here are the 8-bit results again for my final set of interesting curves, compared with the standard 1024- and 4096-point curves as well as the TinyRGB curve (again with the *). I have marked the refined 182- and 212-point curves with a caret(^) for comparison with the initial 8-bit tuned ones.
Points | Max Error | Mean Error | RMS Error | Max DeltaL | Mean DeltaL | RMS DeltaL | Max RT Error 19 | 0.041959 | 0.003564 | 0.007399 | 0.139496 | 0.026601 | 0.038015 | 0 20 | 0.035090 | 0.003569 | 0.007435 | 0.128757 | 0.024572 | 0.035288 | 0 *26 | 0.034171 | 0.001978 | 0.005315 | 0.095100 | 0.014204 | 0.021270 | 0 26 | 0.032400 | 0.002235 | 0.005239 | 0.079740 | 0.014239 | 0.020672 | 0 32 | 0.018609 | 0.001701 | 0.004036 | 0.039496 | 0.009391 | 0.013111 | 0 42 | 0.007896 | 0.000696 | 0.001455 | 0.025409 | 0.005082 | 0.007290 | 0 56 | 0.007696 | 0.000515 | 0.001255 | 0.013777 | 0.002988 | 0.004177 | 0 63 | 0.003646 | 0.000347 | 0.000720 | 0.009950 | 0.002294 | 0.003162 | 0 124 | 0.005790 | 0.000191 | 0.000765 | 0.003821 | 0.000682 | 0.000986 | 0 182 | 0.001022 | 0.000092 | 0.000230 | 0.003107 | 0.000440 | 0.000736 | 0 ^182 | 0.001072 | 0.000102 | 0.000244 | 0.004540 | 0.000516 | 0.000885 | 0 212 | 0.001650 | 0.000118 | 0.000357 | 0.002817 | 0.000449 | 0.000707 | 0 ^212 | 0.001650 | 0.000119 | 0.000361 | 0.003521 | 0.000475 | 0.000743 | 0 256 | 0.005447 | 0.000210 | 0.000802 | 0.004125 | 0.000646 | 0.001042 | 0 1024 | 0.008405 | 0.000205 | 0.000996 | 0.003993 | 0.000475 | 0.000819 | 0 4096 | 0.008405 | 0.000175 | 0.000860 | 0.003054 | 0.000472 | 0.000782 | 0
The changes to the 212-point curve put its ΔL* right between the 1024- and 4096-point curves, so I would still consider it a no-brainer replacement for the standard 1024-point curve. The 182-point curve fared worse in ΔL* but is still quite good, and it has the best overall fit based on RMSE.
Now look what happens when we increase to 10-bit interpolation (1024 input samples)
Points | Max Error | Mean Error | RMS Error | Max DeltaL | Mean DeltaL | RMS DeltaL | Max RT Error 19 | 0.042879 | 0.003594 | 0.007416 | 0.156134 | 0.026705 | 0.038168 | 2 20 | 0.037908 | 0.003614 | 0.007497 | 0.134980 | 0.024678 | 0.035373 | 2 *26 | 0.034650 | 0.001994 | 0.005349 | 0.118553 | 0.014295 | 0.021400 | 2 26 | 0.032418 | 0.002248 | 0.005280 | 0.101960 | 0.014276 | 0.020789 | 2 32 | 0.019770 | 0.001742 | 0.004119 | 0.057270 | 0.009457 | 0.013254 | 1 42 | 0.010831 | 0.000711 | 0.001501 | 0.034022 | 0.005139 | 0.007341 | 1 56 | 0.007831 | 0.000521 | 0.001264 | 0.020659 | 0.002989 | 0.004236 | 0 63 | 0.005564 | 0.000353 | 0.000767 | 0.016122 | 0.002293 | 0.003257 | 0 124 | 0.005790 | 0.000203 | 0.000804 | 0.006320 | 0.000701 | 0.001046 | 0 182 | 0.001697 | 0.000111 | 0.000265 | 0.006673 | 0.000587 | 0.000977 | 0 ^182 | 0.001478 | 0.000110 | 0.000260 | 0.004644 | 0.000561 | 0.000932 | 0 212 | 0.002159 | 0.000130 | 0.000379 | 0.004728 | 0.000527 | 0.000802 | 0 ^212 | 0.001883 | 0.000129 | 0.000379 | 0.003708 | 0.000501 | 0.000774 | 0 256 | 0.005447 | 0.000200 | 0.000782 | 0.005247 | 0.000608 | 0.000980 | 0 1024 | 0.008405 | 0.000240 | 0.001044 | 0.004104 | 0.000617 | 0.000993 | 0 4096 | 0.008996 | 0.000224 | 0.001054 | 0.003897 | 0.000506 | 0.000853 | 0
The refined 212-point curve outperforms everything else. And notice that the smaller curves are starting to show round-trip errors at this sample resolution.
Next up, I’ll test them at 12-bits (4096 input samples)
Points | Max Error | Mean Error | RMS Error | Max DeltaL | Mean DeltaL | RMS DeltaL | Max RT Error 19 | 0.043820 | 0.003601 | 0.007422 | 0.162734 | 0.026727 | 0.038190 | 8 20 | 0.038794 | 0.003622 | 0.007506 | 0.141046 | 0.024697 | 0.035393 | 8 *26 | 0.034660 | 0.001997 | 0.005351 | 0.119456 | 0.014308 | 0.021410 | 7 26 | 0.032431 | 0.002252 | 0.005284 | 0.102827 | 0.014287 | 0.020800 | 6 32 | 0.019606 | 0.001746 | 0.004125 | 0.056738 | 0.009444 | 0.013208 | 4 42 | 0.010970 | 0.000712 | 0.001500 | 0.034507 | 0.005144 | 0.007346 | 2 56 | 0.007827 | 0.000522 | 0.001265 | 0.022992 | 0.002991 | 0.004235 | 1 63 | 0.006099 | 0.000353 | 0.000767 | 0.015857 | 0.002288 | 0.003245 | 1 124 | 0.005790 | 0.000205 | 0.000812 | 0.006420 | 0.000701 | 0.001047 | 1 182 | 0.002016 | 0.000112 | 0.000266 | 0.008137 | 0.000588 | 0.000983 | 0 ^182 | 0.001482 | 0.000110 | 0.000261 | 0.005065 | 0.000561 | 0.000936 | 0 212 | 0.002439 | 0.000130 | 0.000381 | 0.005398 | 0.000528 | 0.000802 | 0 ^212 | 0.001904 | 0.000129 | 0.000381 | 0.003735 | 0.000502 | 0.000775 | 0 256 | 0.005447 | 0.000202 | 0.000787 | 0.005244 | 0.000610 | 0.000981 | 0 1024 | 0.008405 | 0.000222 | 0.001025 | 0.003972 | 0.000508 | 0.000852 | 0 4096 | 0.192685 | 0.000376 | 0.004745 | 0.004194 | 0.000628 | 0.001014 | 0
Look what’s happened with the 4096-point curve. Now that we’re using all of its points, we can see it’s got a serious flaw. Its max error has jumped way up, and its ΔL* is now worse than the 1024-point curve’s. It’s easy to see why. Have a look at its values for the linear part of the curve:
0,1,2,4,5,6,7,9,10,11,12,14,15,16,17,19,20,21,22,24,25,26,27,28,30,31,32,33,35,36,37,38,40,41,42,43,45,46,47,48,50,51,52,53,55,56,57,58,59,61,62,63,64,66,67,68,69,71,72,73,74,76,77,78,79,81,82,83,84,85,87,88,89,90,92,93,94,95,97,98,99,100,102,103,104,105,107,108,109,110,111,113,114,115,116,118,119,120,121,123,124,125,126,128,129,130,131,133,134,135,136,137,139,140,141,142,144,145,146,147,149,150,151,152,154,155,156,157,159,160,161,162,164,165,166,167,168,170,171,172,173,175,176,177,178,180,181,182,183,185,186,187,188,190,191,192,193,194
Again, the problem is apparent. The steps are uneven, alternating between 1-1-2 and 1-1-1-2 patterns. At that resolution, the 16-bit quantization is making it impossible to get the correct slope for the linear part of the curve, which is why the max error jumped up to over 19%. The 212-point curve is still looking outstanding, by the way. And the smaller curves are showing even larger round-trip errors.
And finally, let’s see what it looks like if we interpolate all possible 16-bit samples (65536 of them) with these curves.
Points | Max Error | Mean Error | RMS Error | Max DeltaL | Mean DeltaL | RMS DeltaL | Max RT Error 19 | 0.044230 | 0.003603 | 0.007423 | 0.162631 | 0.026733 | 0.038193 | 135 20 | 0.039190 | 0.003625 | 0.007508 | 0.141160 | 0.024702 | 0.035397 | 123 *26 | 0.034661 | 0.001997 | 0.005352 | 0.120915 | 0.014312 | 0.021413 | 114 26 | 0.032431 | 0.002254 | 0.005286 | 0.104229 | 0.014291 | 0.020803 | 98 32 | 0.019766 | 0.001749 | 0.004129 | 0.057259 | 0.009446 | 0.013210 | 65 42 | 0.011194 | 0.000712 | 0.001501 | 0.035285 | 0.005145 | 0.007347 | 36 56 | 0.007860 | 0.000522 | 0.001265 | 0.023274 | 0.002991 | 0.004235 | 24 63 | 0.006172 | 0.000353 | 0.000768 | 0.016116 | 0.002288 | 0.003246 | 18 124 | 0.005790 | 0.000206 | 0.000815 | 0.006580 | 0.000701 | 0.001047 | 9 182 | 0.002045 | 0.000112 | 0.000266 | 0.008139 | 0.000588 | 0.000983 | 7 ^182 | 0.001482 | 0.000110 | 0.000261 | 0.005133 | 0.000562 | 0.000936 | 5 212 | 0.002560 | 0.000131 | 0.000382 | 0.005650 | 0.000528 | 0.000803 | 7 ^212 | 0.001905 | 0.000130 | 0.000381 | 0.003738 | 0.000502 | 0.000775 | 5 256 | 0.005447 | 0.000203 | 0.000789 | 0.005248 | 0.000611 | 0.000981 | 6 1024 | 0.008405 | 0.000223 | 0.001028 | 0.004089 | 0.000509 | 0.000853 | 6 4096 | 0.192685 | 0.000324 | 0.004697 | 0.004178 | 0.000497 | 0.000820 | 6
At this sample resolution, none of the curves pass the round-trip test, but you can see that, once again, the refined 212-point curve shows the least visual error. This test also reinforces the validity of the ΔL* measure. The max round-trip error is predicted by and follows the ΔL*. sRGB is not quite as perceptually uniform as L*, so it’s not a 100% match, but it’s a very good predictor of what will happen as the sample resolution increases. A difference of 6/65335 (0.000092) is most certainly not going to be visible, but if you can drop that error to 5/65335 and save over 1.5KB off the ICC profile size at the same time, that’s a no-brainer.
And that just left one question to answer before I could wrap up my curve testing. What would happen if you used these curves in a target profile rather than a source profile? With a source profile, you can predict exactly which values will be looked up or interpolated from the curve, because those values are defined by the bit-depth of the image. 8 bits means exactly 256 values can be looked up, etc. That’s what we tested. With a target profile, however, the curve is used in reverse. Output values become input values and vice-versa. And the input values become unpredictable. They could be any floating-point number between 0 and 1. So that left me with one test to run.
For these final numbers, I generated a set of 1 million random floating-point numbers between 0 and 1, and interpolated the output values for them. The round-trip test becomes meaningless in this case because you can’t round-trip a random number, but the rest of the numbers can be interpreted the same as before.
Points | Max Error | Mean Error | RMS Error | Max DeltaL | Mean DeltaL | RMS DeltaL 19 | 0.044252 | 0.003601 | 0.007417 | 0.162732 | 0.026701 | 0.038136 20 | 0.039191 | 0.003632 | 0.007520 | 0.141245 | 0.024726 | 0.035427 *26 | 0.034661 | 0.002002 | 0.005362 | 0.120960 | 0.014326 | 0.021432 26 | 0.032431 | 0.002261 | 0.005299 | 0.104273 | 0.014305 | 0.020825 32 | 0.019764 | 0.001752 | 0.004138 | 0.057252 | 0.009441 | 0.013209 42 | 0.011204 | 0.000714 | 0.001503 | 0.035317 | 0.005147 | 0.007351 56 | 0.007860 | 0.000522 | 0.001263 | 0.023292 | 0.002987 | 0.004229 63 | 0.006174 | 0.000354 | 0.000770 | 0.016113 | 0.002292 | 0.003253 124 | 0.005790 | 0.000207 | 0.000819 | 0.006588 | 0.000700 | 0.001046 182 | 0.002045 | 0.000112 | 0.000267 | 0.008148 | 0.000588 | 0.000984 ^182 | 0.001482 | 0.000110 | 0.000261 | 0.005147 | 0.000561 | 0.000935 212 | 0.002566 | 0.000131 | 0.000382 | 0.005663 | 0.000528 | 0.000803 ^212 | 0.001905 | 0.000130 | 0.000382 | 0.003738 | 0.000503 | 0.000776 256 | 0.005447 | 0.000204 | 0.000793 | 0.005248 | 0.000611 | 0.000981 1024 | 0.008405 | 0.000225 | 0.001035 | 0.004100 | 0.000509 | 0.000853 4096 | 0.192685 | 0.000331 | 0.004806 | 0.004190 | 0.000497 | 0.000821
And the results are just about the same as before. So that does it… I’m convinced that my refined 212-point curve is not just the best fit for 8-bit image conversion – I believe it’s the best overall fit possible for the sRGB gamma curve within the restrictions of the v2 ICC profile format. I call it the Magic Curve, natch.
For a space-saving curve, any of those options between 32 and 63 points would be a huge improvement over Facebook’s 26-point attempt. I’ll be making a few size-conscious profile options with those and testing them out.
And the smallest usable curve is really 20 points. Although the 19-point curve was also valid according to the 8-bit round-trip test, it’s kind of pointless because an odd number of curve points means that the ‘curv’ tag has to be padded by 2 bytes to maintain alignment. You may as well include the extra point if it helps accuracy – and it does in this case. I’ll make what I believe to be the smallest possible sRGB-compatible profile (410 bytes) using that 20-point curve. Note that it is worse than the TinyRGB curve in terms of accuracy, but it’s not as much worse as the 32-point curve is better. Which is to say, once again, the 26-point curve is not at all special in its size/accuracy ratio.
Check the final post in this series for details on those profiles, some real-world tests using them, and of course, download links. In the meantime, I have some investigation to do regarding the XYZ color values used in sRGB profiles. That topic turned out to be another tricky one.