PhotoSauce Blog

Fast Hashing with BLAKE2 Part 1: NuGet is a Minefield

May 26. 2018 2 Comments

Posted in:
Performance
Benchmarks

Free as in…

Before I get in to the titular topic of this post, I’d like to discuss my motivation for writing it. Free software has become increasingly important in the business world over the last couple of decades, and the industry has adopted phrases like “free as in beer” and “free as in speech” to define what ‘free’ even means.

For those not familiar, “free as in beer” speaks to the acquisition cost of the software. In the old days, closed-source freeware and shareware were common. They were free (of cost) to download and use, but using them was all you could do. You were not free (as in freedom) to see or modify the code. In light of that, it became important to differentiate software that was truly free, in that you can use it in any way you please, modify it, or enhance it. That software is “free as in speech”.

In the Microsoft world, the .NET Framework has always been “free as in beer” – assuming you discounted the Windows license you had to pay for in order to have a place to run it. With .NET Core, .NET finally became “free as in speech”, and it has improved at an unprecedented pace as people use that freedom to give back to the project. That change has accelerated (or at the very least coincided with) the uptake of Free Open Source Software (FOSS) in the corporate world as well, especially among the ‘Microsoft shops’ that typically eschewed open source. And that has led to more conversations about the true cost and true value of FOSS.

When talking about overall cost, another phrase now in common use is “free as in puppy”. That phrase is somewhat less well-defined than the others. To some, it means that adopting free software comes with some responsibility. It has to be cared for on an ongoing basis, or else it may grow unhealthy and eventually die. That’s true to some extent, but I do agree with Jeff’s take on it as well. If a piece of software requires as much maintenance as a puppy, you shouldn’t be using it, free or not.

Another way of looking at it is that the acquisition cost of the software is inconsequential compared to the cost of evaluation, training, integration, testing, and maintenance of said software within a larger system. It doesn’t matter whether you pick up a stray puppy off the street or buy a fancy $1k designer puppy from a breeder; the cost of caring for that puppy over its lifetime will dwarf the acquisition cost. Likewise, in a business environment, whether you pay $10k for a piece of software or get it for free, if you use it long enough, the acquisition cost will become largely irrelevant.

Which brings me to another phrase I saw recently: “free as in mattress”. I think many of us with a corporate development background have learned to view free software in this way. For small bits of functionality (like a simple hashing algorithm), a corporate team often has the choice to build or buy – whether at zero acquisition cost or some higher number. If the team is good, the cost to build can be estimated fairly accurately as can the maintenance cost. So, like a new mattress, it has a known upfront cost and known risks. When you buy (or take for free) a piece of software, you often don’t know what you’re getting into – not unlike a used mattress. Maybe that free mattress looks nice on the outside. But when you’re dealing with a bedbug infestation a few months later, ‘free’ is a much less good deal. Many would prefer to avoid the risk altogether and buy the new mattress every time.

I’ve seen enough bad code offered up in blog posts, CodeProject articles, StackOverflow answers, and SourceForge/GitHub projects to be very wary of all but the largest corporate-sponsored projects. I don’t mean to pick on the people who write that code. It takes courage to publish the code you write for the world to see (and criticize). And it takes generosity to offer up something you worked hard on for others to use, with no benefit (and likely added headaches) to you. But it also takes a lot of trust to bring that mattress into your house – or code into your project. And, of course, as an author of open source projects myself, I do appreciate the irony in having that attitude.

Caveat Implementor

Despite the benefits that come with the larger swing in the direction of embracing FOSS, maybe sometimes it’s good to remember the lessons we’ve learned over the years when it comes to software quality and maintenance cost. I was reminded of that recently when evaluating implementations of the BLAKE2 hashing algorithm.

I had looked at BLAKE2 a few years ago when choosing a hashing algorithm to use for the cache file naming in WebRSize. I use a base-32 encoded 40-bit hash of the settings used to generate an image when naming its cache file. One neat thing about BLAKE2 is that the spec allows for hashes of any length between 1 and 64 bytes, and the hash length is fed into the parameter block that is mixed with the initialization vector, so a 40-bit hash is not just the first 40 bits of the full-length hash; it’s a different value altogether.

Although I wanted to use the BLAKE2 algorithm, Microsoft doesn’t supply one in .NET, and the only NuGet packages available were a random assortment from developers I don’t know or trust. It was a perfect example of a “free as in mattress” problem, if that’s how you tend to view these things. I didn’t want to take the time to evaluate the available libraries properly nor to write my own, so I decided to simply take the first 40 bits of a SHA256 hash instead, using the hash algorithm built in to .NET (which uses CNG on Windows) .

When .NET Core 2.1 RC1 was released a couple of weeks ago, I was trying to come up with a small project I could use to try out the new X86 Intrinsics support. The reference BLAKE2 implementation includes SSE-optimized versions of the algorithms, so I though porting those would let me test out the new functionality while getting the trustworthy BLAKE2 implementation I had wanted. And since I had to set up a reference implementation and test harness for checking correctness and for benchmarking, I decided to go all out and check all the NuGet packages I could find using the same tests/standards. What I found was that the “free as in mattress” view of things is as legit as ever.

BLAKE2 in a Nutshell

BLAKE2 is derived from the BLAKE algorithm, which was one of the leading candidates from the SHA-3 competition. Ultimately, it wasn’t chosen (that honor went to Keccak), but it has some interesting properties that make it useful for general-purpose secure hashing. The short version is, they claim it’s more secure than SHA-2, and the modifications in BLAKE2 make it faster than MD5 when calculated in software. Basically, anywhere people use MD5 today, BLAKE2 is a definite upgrade.

BLAKE2 comes in two flavors: BLAKE2b and BLAKE2s. BLAKE2b produces a 512-bit hash using an internal state made up of 8 64-bit values and is optimized for 64-bit platforms. BLAKE2s uses 8 32-bit values to produce a 256-bit hash so it can be faster on 32-bit platforms. In Part 2 of this post, we’ll see that use of the SSE instruction sets can make BLAKE2b perform nearly equally in 32-bit and 64-bit, but let’s not jump ahead…

The Reference

The designers of BLAKE2 published several reference implementations in a GitHub repo, so that’s a natural place to start.

Among those is a C# implementation (Blake2Sharp), which should be the perfect reference to use for my testing. The only thing that gave me pause was that the code is incomplete. Not all the BLAKE2 functionality is finished. Tree hashing mode, for example, is partially there but commented out. And there are three different versions of the core algorithm implementation, with two of those commented out – both slower than the ‘unrolled’ version that is enabled. It’s also missing the BLAKE2s variant. Bottom line: it looks like a work in progress and hasn’t been updated in years. I decided to include it in my testing but figured I should bring along some backup just to be safe…

The C reference implementation is complete, including both the BLAKE2b and BLAKE2s variants. And there are the aforementioned SSE-optimized versions. I decided to compile the scalar version into a DLL and call it via PInvoke as a baseline performance reference.

Finally, there are simpler and tidier implementations of both variants available in the RFC that describes BLAKE2. Although they’re written in C, it was very easy to port those over to C# to serve as another set of backup references. Those implementations are designed to be simple and correct, with no optimization. The RFC versions omit the optional features like tree mode hashing, but the implementations are less than 200 lines of code each and very easy to follow. My C# conversion is as true to the C reference as possible, including the lack of optimization.

The Tests

With those references chosen (3 for BLAKE2b and 2 for BLAKE2s), I set about methodically testing every NuGet package I could find. My requirements were simple: the implementation had to support the basic features defined in the RFC. That is, keyed and non-keyed hashing with variable digest length from 8 bits up to 256 or 512, as appropriate. I tested the qualifying implementations for speed and correctness.

Benchmarking was performed with BenchmarkDotNet under .NET Core 2.1-rc1 on Windows 10, using the following inputs.

The ASCII string ‘abc’
The contents the of sRGB ICC profile that ships with Windows 10 (3.19KiB)
10MiB of random data.

I also tested each implementation with other data sizes and with keyed hashes, but for the sake of brevity, I’ll just include the results for the three listed above. The output was a 40-bit hash, and I included a custom column in the BenchmarkDotNet output to allow for a quick check of output correctness. Note that BLAKE2b and BLAKE2s are different algorithms and produce different outputs by design. Test code is published here, and this is my test environment:

BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
Intel Xeon CPU E3-1505M v6 3.00GHz, 1 CPU, 8 logical and 4 physical cores
Frequency=2929692 Hz, Resolution=341.3328 ns, Timer=TSC
.NET Core SDK=2.1.300-rc1-008673
  [Host]     : .NET Core 2.1.0-rc1 (CoreCLR 4.6.26426.02, CoreFX 4.6.26426.04), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.0-rc1 (CoreCLR 4.6.26426.02, CoreFX 4.6.26426.04), 64bit RyuJIT

Here’s what I found:

The Good

Of the 8 NuGet packages I found that list BLAKE2 support, only one had an implementation that was complete and correct according to the RFC as well as being fast enough for general-purpose hashing. The winner is blake2s-net

This implementation appears to be a straight conversion of the Blake2Sharp reference code to support the BLAKE2s algorithm, with original credit going to Dustin Sparks.

Here are its benchmark results compared with the 2 references:

3-byte input

           Method |       Hash |     Mean |    Error |   StdDev |  Gen 0 | Allocated |
----------------- |----------- |---------:|---------:|---------:|-------:|----------:|
 Blake2sRefNative | FE4D57BA07 | 259.8 ns | 1.444 ns | 1.351 ns | 0.0072 |      32 B |
       Blake2sRFC | FE4D57BA07 | 794.8 ns | 4.051 ns | 3.789 ns | 0.0067 |      32 B |
      blake2s-net | FE4D57BA07 | 366.0 ns | 2.053 ns | 1.921 ns | 0.1273 |     536 B |

3.19KiB input

           Method |       Hash |      Mean |     Error |    StdDev |  Gen 0 | Allocated |
----------------- |----------- |----------:|----------:|----------:|-------:|----------:|
 Blake2sRefNative | 62320CA3FC |  9.818 us | 0.0503 us | 0.0446 us |      - |      32 B |
       Blake2sRFC | 62320CA3FC | 39.240 us | 0.3034 us | 0.2689 us |      - |      32 B |
      blake2s-net | 62320CA3FC |  7.274 us | 0.0326 us | 0.0305 us | 0.1221 |     536 B |

10MiB input

           Method |       Hash |      Mean |     Error |    StdDev | Allocated |
----------------- |----------- |----------:|----------:|----------:|----------:|
 Blake2sRefNative | 6500962DE3 |  30.87 ms | 0.1184 ms | 0.0989 ms |       0 B |
       Blake2sRFC | 6500962DE3 | 122.67 ms | 0.5827 ms | 0.5451 ms |       0 B |
      blake2s-net | 6500962DE3 |  22.27 ms | 0.1013 ms | 0.0898 ms |     536 B |

This is exactly what you’d expect from a version that’s correctly implemented and optimized. The only knock on this package is that it is compiled with a .NET Framework target, so it can’t be used with older .NET Core or .NET Standard projects. It does work, however, with the .NET Framework library support in .NET Core 2.0 and up. And this one only implements the BLAKE2s variant, so for BLAKE2b, you’ll need to look elsewhere.

[Note that in Part 2 of this post, I’ll cover my own optimized BLAKE2s implementation which does better than this one.]

You can also see here that the RFC implementation is, as expected, very slow. It’s correct, but I wouldn’t use it in any real project. Remember that speed is one of the main reasons for choosing BLAKE2 over other hashing algorithms, so a slow implementation makes it rather pointless.

The Bad

I can’t say I was surprised to find that one of the 8 packages contained an incorrect implementation of the BLAKE2 algorithm, but I was surprised to find that it was the one with the highest download count. If you search ‘BLAKE2’ on nuget.org today, the top match will likely be Konscious.Security.Cryptography.Blake2

This appears to be a from-scratch implementation of BLAKE2b based on the RFC but with a mistake that will show up shortly. Let’s jump straight into the benchmark results.

3-byte input

           Method |       Hash |       Mean |    Error |   StdDev |  Gen 0 | Allocated |
----------------- |----------- |-----------:|---------:|---------:|-------:|----------:|
 Blake2bRefNative | 44229FC0EF |   330.2 ns | 2.326 ns | 2.176 ns | 0.0072 |      32 B |
       Blake2bRFC | 44229FC0EF | 1,134.0 ns | 8.745 ns | 8.180 ns | 0.0057 |      32 B |
      Blake2Sharp | 44229FC0EF |   519.0 ns | 3.886 ns | 3.635 ns | 0.2050 |     864 B |
        Konscious | 44229FC0EF | 1,524.1 ns | 9.384 ns | 8.318 ns | 0.2213 |     936 B |

3.19KiB input

           Method |       Hash |      Mean |     Error |    StdDev |  Gen 0 | Allocated |
----------------- |----------- |----------:|----------:|----------:|-------:|----------:|
 Blake2bRefNative | 61EB59036B |  6.143 us | 0.0276 us | 0.0244 us |      - |      32 B |
       Blake2bRFC | 61EB59036B | 26.434 us | 0.1139 us | 0.1010 us |      - |      32 B |
      Blake2Sharp | 61EB59036B |  5.549 us | 0.0295 us | 0.0276 us | 0.1984 |     864 B |
        Konscious | 61EB59036B | 20.954 us | 0.1704 us | 0.1510 us | 0.2136 |     936 B |

10MiB input

           Method |       Hash |     Mean |     Error |    StdDev | Allocated |
----------------- |----------- |---------:|----------:|----------:|----------:|
 Blake2bRefNative | 7B6AB409B7 | 18.94 ms | 0.1008 ms | 0.0894 ms |       0 B |
       Blake2bRFC | 7B6AB409B7 | 83.18 ms | 0.6921 ms | 0.6135 ms |       0 B |
      Blake2Sharp | 7B6AB409B7 | 16.61 ms | 0.1297 ms | 0.1214 ms |     864 B |
        Konscious | 1636541AC6 | 63.99 ms | 0.4153 ms | 0.3885 ms |     936 B |

First, I’ll point out that the Blake2Sharp reference implementation does slightly better than the native reference version on all but the tiniest input, just as the blake2s-net conversion from that same base did better than its native reference. And the RFC version, once again, is the slowest.

Check out the Konscious version, though. Not only is it 3-4x slower than the Blake2Sharp implementation, it produced a bad hash on the 10MiB input. It turns out, that implementation has a bug that affects any input that is an even multiple of the [128 byte] block size. At an even 10MiB, that last test input triggered the bug.

I have reported the bug to the owner of that package/project, and it may be fixed by the time you read this. But that may not be a good thing for anyone already using this library. If you generate hashes and then save them somewhere with the intention of validating things against them later, you can’t just ‘fix’ a problem in the hash implementation, because you will invalidate any hashes created with the broken version. And because the hash, by definition, reveals nothing about its input data, there’s no way to identify which hashes are correct and which are incorrect after the fact. You may be better off keeping it broken, bad as that may be.

The Ugly

Sorry, I had to do it.

Although it doesn’t have any logic bugs, there isn’t much else nice I can say about System.Data.HashFunction.Blake2

This looks like another from-scratch implementation. And although it produces good hash values, check out the benchmarks:

3-byte input

           Method |       Hash |       Mean |     Error |    StdDev |  Gen 0 | Allocated |
----------------- |----------- |-----------:|----------:|----------:|-------:|----------:|
       Blake2bRFC | 44229FC0EF | 1,154.3 ns | 12.779 ns | 11.954 ns | 0.0057 |      32 B |
      Blake2Sharp | 44229FC0EF |   523.7 ns |  4.712 ns |  4.408 ns | 0.2050 |     864 B |
 S.D.HashFunction | 44229FC0EF | 2,364.9 ns | 27.715 ns | 25.925 ns | 0.4120 |    1744 B |

3.19KiB input

           Method |       Hash |      Mean |     Error |    StdDev |  Gen 0 | Allocated |
----------------- |----------- |----------:|----------:|----------:|-------:|----------:|
       Blake2bRFC | 61EB59036B | 26.745 us | 0.1249 us | 0.1168 us |      - |      32 B |
      Blake2Sharp | 61EB59036B |  5.682 us | 0.0397 us | 0.0331 us | 0.1984 |     864 B |
 S.D.HashFunction | 61EB59036B | 36.869 us | 0.1811 us | 0.1513 us | 2.1973 |    9344 B |

10MiB input

           Method |       Hash |      Mean |     Error |    StdDev |     Gen 0 |  Allocated |
----------------- |----------- |----------:|----------:|----------:|----------:|-----------:|
       Blake2bRFC | 7B6AB409B7 |  82.62 ms | 0.3159 ms | 0.2800 ms |         - |        0 B |
      Blake2Sharp | 7B6AB409B7 |  16.59 ms | 0.1275 ms | 0.1193 ms |         - |      864 B |
 S.D.HashFunction | 7B6AB409B7 | 113.15 ms | 0.3898 ms | 0.3646 ms | 5937.5000 | 24905120 B |

I dropped the native DLL version from this run since we’ve already shown Blake2Sharp is faster, which makes it the proper reference to use going forward.

Notice that this implementation, in addition to being much slower than even the slow RFC version, uses several times more memory than the size of the input data. A hashing function should only read the input and perform computations on it, not make multiple copies of it. I didn’t dig into the code to see what went wrong here, but this is a hidden performance trap waiting to get anyone who dares use this library.

Sadly, I’m sure some people will pick this one from NuGet either because they mistake it for a Microsoft package or simply because they like the naming that looks like the Microsoft packages. There is a new policy in place on NuGet that prevents third-party packages named starting with ‘System.’, but Microsoft is allowing any existing packages to stay put. Beware.

This one also has a sibling package called System.Data.HashFunction.Blake2.Net40Async

I wasn’t able to get that one to work in my benchmark app, although I’ll admit I didn’t try very hard. It appears to be the same basic thing as the one above but with the added trap of a ComputeHashAsync method. Hashing is a CPU-bound operation, so there’s no place for async in it. Trying to run the hash itself asynchronously just adds extra thread context-switching overhead.

If you are receiving data from somewhere asynchronously, simply use a hashing implementation that allows for incremental updates (the BLAKE2 algorithms support this) and update the hash synchronously with each data packet you receive asynchronously.

The Butfor

But for one simple mistake, there would be 2 libraries in the ‘Good’ section. I like the honesty in the readme for Blake2Core

“This is quite literally a copy/paste from BLAKE2 and built into a NuGet package, available here. I needed it in my .Net Core project, and I'm sure other people as well.”

This is an exact copy of the Blake2Sharp reference code, and it would have been exactly as good as my reference copy except that the NuGet package contains a debug build of the DLL, with optimizations disabled. In many cases, there isn’t much difference in performance between Release and Debug builds of .NET code, but for something computation-heavy like hashing, it can make a huge difference.

3-byte input

      Method |       Hash |       Mean |     Error |    StdDev |  Gen 0 | Allocated |
------------ |----------- |-----------:|----------:|----------:|-------:|----------:|
  Blake2bRFC | 44229FC0EF | 1,134.2 ns |  5.146 ns |  4.814 ns | 0.0057 |      32 B |
 Blake2Sharp | 44229FC0EF |   524.6 ns |  4.869 ns |  4.316 ns | 0.2050 |     864 B |
  Blake2Core | 44229FC0EF | 1,877.0 ns | 11.314 ns | 10.583 ns | 0.2041 |     864 B |

3.19KiB input

      Method |       Hash |      Mean |     Error |    StdDev |  Gen 0 | Allocated |
------------ |----------- |----------:|----------:|----------:|-------:|----------:|
  Blake2bRFC | 61EB59036B | 26.367 us | 0.1776 us | 0.1661 us |      - |      32 B |
 Blake2Sharp | 61EB59036B |  5.652 us | 0.0292 us | 0.0259 us | 0.1984 |     864 B |
  Blake2Core | 61EB59036B | 26.023 us | 0.1694 us | 0.1584 us | 0.1831 |     864 B |

10MiB input

      Method |       Hash |     Mean |     Error |    StdDev | Allocated |
------------ |----------- |---------:|----------:|----------:|----------:|
  Blake2bRFC | 7B6AB409B7 | 83.79 ms | 0.4101 ms | 0.3636 ms |       0 B |
 Blake2Sharp | 7B6AB409B7 | 16.58 ms | 0.1105 ms | 0.1033 ms |     864 B |
  Blake2Core | 7B6AB409B7 | 78.03 ms | 0.3949 ms | 0.3694 ms |     864 B |

Without JIT optimization, this library is almost as slow as the RFC version. The only place it has an advantage is that it doesn’t do all the byte shuffling to ensure the words are in little-endian order as required by BLAKE2. The RFC code does that shuffling whether it’s needed or not. The Blake2Sharp code copies the data without shuffling if it’s already ordered correctly, and that savings shows up in the 10MiB run.

By the way, BenchmarkDotNet has a validator that detects this problem and actually refuses to run benchmarks unless you override it. I had to do that for this run so we could see the impact.

Ultimately, this one counts as another performance trap, so don’t use it unless it gets an update.

[Once again, I’ll detail a better BLAKE2b implementation in the second part of this post]

This library also uses a .NET Standard 1.6 build target, so it can’t be used with older versions of .NET Framework (including 4.6). There’s no reason it wouldn’t be compatible; it’s just not multi-targeted.

The Weird

I’m honestly not sure what to make of Isopoh.Cryptography.Blake2b

The hashing implementation itself is taken straight from the Blake2Sharp reference. This library, however, adds a feature that uses a ‘SecureArray’ during the hashing. From what I understand, the SecureArray uses PInvoke to request that the OS lock access to memory during hashing, and then it securely zeroes that memory before returning. This is not without overhead, however, as the benchmarks show.

3-byte input

      Method |       Hash |           Mean |          Error |           StdDev |     Gen 0 |     Gen 1 |     Gen 2 |   Allocated |
------------ |----------- |---------------:|---------------:|-----------------:|----------:|----------:|----------:|------------:|
  Blake2bRFC | 44229FC0EF |     1,142.3 ns |       7.136 ns |         6.326 ns |    0.0057 |         - |         - |        32 B |
 Blake2Sharp | 44229FC0EF |       534.7 ns |       4.650 ns |         4.349 ns |    0.2050 |         - |         - |       864 B |
      Isopoh | 44229FC0EF | 9,187,594.5 ns | 386,206.608 ns | 1,114,294.368 ns | 2332.5195 | 2314.4531 | 2314.4531 | 710953144 B |

3.19KiB input

      Method |       Hash |         Mean |       Error |        StdDev |     Gen 0 |     Gen 1 |     Gen 2 |   Allocated |
------------ |----------- |-------------:|------------:|--------------:|----------:|----------:|----------:|------------:|
  Blake2bRFC | 61EB59036B |    26.880 us |   0.1841 us |     0.1722 us |         - |         - |         - |        32 B |
 Blake2Sharp | 61EB59036B |     5.629 us |   0.0273 us |     0.0256 us |    0.1984 |         - |         - |       864 B |
      Isopoh | 61EB59036B | 8,094.502 us | 727.7956 us | 2,134.4986 us | 1724.1211 | 1710.4492 | 1710.4492 | 524302827 B |

10MiB input

      Method |       Hash |     Mean |     Error |    StdDev | Allocated |
------------ |----------- |---------:|----------:|----------:|----------:|
  Blake2bRFC | 7B6AB409B7 | 82.77 ms | 0.4741 ms | 0.4202 ms |       0 B |
 Blake2Sharp | 7B6AB409B7 | 16.63 ms | 0.1210 ms | 0.1132 ms |     864 B |
      Isopoh | 7B6AB409B7 | 16.67 ms | 0.1183 ms | 0.1106 ms |     984 B |

I can’t tell whether the ridiculous amount of memory allocated is a bug or by design. It’s very odd that it’s highest with the smallest input. And I can’t tell whether the lack of extra allocation on the 10MiB input is because it simply skips the extra processing past a certain size threshold or because the memory use is related to partially-filled blocks.

Although it would be accurate to say it’s more than 17000x slower than Blake2Sharp with small inputs, it might be more fair to say it has a high fixed overhead. Either way, it’s not suitable for general-purpose hashing. But unlike the libraries I’ve reviewed so far, this one doesn’t necessarily claim to be. I’m not sure of the value of securing the hash state memory when both the key and message data have been passed around unsecurely before the library has a chance to use them, but I might be missing something.

I’d recommend you stay away from this library unless you truly need whatever specialized benefit it offers and have good explanations for the issues I pointed out above.

The Others

I have to give an honorable mention to NSec.Cryptography

This library is based on libsodium, which is a relatively mature platform-native security library. It didn’t meet my criteria in that in explicitly disallows hashes less than 32 bytes and is, therefore, not RFC-compliant. I couldn’t tell whether this was a limitation of libsodium or of its .NET wrapper. I also didn’t see a way to do a keyed hash, but I might have just missed it. I can say that for general-purpose hashing, if you don’t need to use a key and can use a full-length digest, this library works and is slightly faster than the best I could do with managed code. In fact, the only thing I found that’s faster is an AVX2 version of the BLAKE2 reference code. I’ll be doing a port of that AVX2 version once support is available (should be coming in .NET Core 2.2) so check back for that later.

And finally, there’s Multiformats.Hash

This one lists BLAKE2 among its algorithms, but to quote from its readme:

“This is not a general purpose hashing library, but a library to encode/decode Multihashes which is a "container" describing what hash algorithm the digest is calculated with. The library also support calculating the digest, but that is not it's main purpose. If you're looking for a library that supports many algorithms and only want the raw digest, try BouncyCastle or the built-ins of the .net framework.”

Enough said there. It may or may not be any good at what it does, but it definitely does not do what I need it to do.

The Conclusion

Obviously, this was a very small sample size from the 115k+ packages on NuGet today and may not be representative of packages of all types. But the lesson is clear: there are no quality checks on NuGet, and download count is absolutely no indication of quality. In fact, download count tends to be self-reinforcing. People gravitate toward the “popular” packages, making it even more dangerous when one of these has a serious bug or design flaw. Not to mention, nuget.org seems to sort by popularity.

It’s dangerous to bring a library into your project without proper testing, and the presence of unit tests in a project or a lack of open issues are no guarantee that the code isn’t broken. As I like to say, “bad coders code bad tests that test their bad code badly”. Always test for yourself.

Tune in next time for some details on my own improved BLAKE2 implementations using the new X86 Intrinsics in .NET Core 2.1. Until then, sleep tight, and don’t let the bedbugs bite…

Making a Minimal sRGB ICC Profile Part 4: Final Results

April 24. 2018 1 Comments

*Note: If you’re just here for the profiles, I have published those in a new github repo. Get them all here.

Thanks to some much-needed vacation time, it’s taken me a while to get to this final part of the series, but now it’s time to put everything together and get some profiles finalized. In the first three parts of this series, I examined ways to pack an ICC v2 profile as small as possible, an approach for finding an ideal point-based TRC fit with the minimum size, and how to derive the correct color primaries and whitepoint for an sRGB-compliant profile. In this final part, I will assemble some profiles using those techniques/values and test them out. I had difficulty devising real-world tests that would demonstrate the differences between profiles, but I think I’ve finally nailed down some good approximations that are fair and realistic.

My initial test plan was simply to re-create the worst case scenario for profile conversion. If a profile performs acceptably in the worst case, it should do even better under less extreme circumstances. For this reason, I decided to focus on conversions from sRGB to ProPhoto RGB. The thinking behind this is that an embedded sRGB profile will be used to convert to other colorspaces, and the colorspace that is the most different from sRGB would be the worst case. It would be possible to construct a custom colorspace that would be even more different than ProPhoto, but that wouldn’t be realistic. ProPhoto is a real colorspace that people actually use, and it has both a gamut that is much, much larger than sRGB and a response curve that is quite different (reference gamma 1.8 vs 2.2). An even more common scenario might be something like sRGB to Adobe RGB or Rec. 2020, but again, if a profile does well with ProPhoto, the others will work even better.

The Reference Image

Having settled on an evaluation strategy, I needed to pick some test images. This turned out to be more difficult than I anticipated. I originally selected a few real-world images that had extremely saturated colors and a few with lots of different shades of blue and green. These are areas where ProPhoto and sRGB would have maximum differences, and that should highlight any errors. Unfortunately, I found it was impossible to compare fairly with real-world images for two main reasons:

No real-world image covers the entire color gamut of sRGB, so an error might not show up simply because the color value that would show the error isn’t present in the image.
Real-world images tend to have areas of repeated pixel values. This means that if one profile causes a specific color to have an error, and if that color is over-represented in the image, it amplifies the error measured from the profile.

For those reasons, I settled on testing with a single reference image. That image comes from Bruce Lindbloom’s site and is a 16.7megapixel generated image that simply contains every color combination possible with 8-bit RGB. The image consists of 256 squares, each with a different blue value. And each of those squares consists of 256 rows and 256 columns, where the red value increases in each column and the green value increases in each row. I found this image makes it easy to see exactly where the errors are focused.

The Reference Profile

The second problem I had was establishing a reference to compare to. In testing my tone reproduction curves, I tested each candidate curve against the true sRGB inverse gamma curve. For the final profile testing, however, I wanted to test real images with real profiles using a real CMS. So I needed a real ICC profile to serve as a reference. Unfortunately, as we discovered in Part 3 of this series, there aren’t any profiles I could find anywhere that are truly sRGB-compliant. Nor could I use the standard 1024-point TRC as a reference, because one thing I want to evaluate is whether the 182- and 212-point curves I found in Part 2 might actually be better than the 1024-point curve used in most profiles.

This series is focused on creating v2 ICC profiles, but v4 profiles have a newer feature that allows the TRC to be defined as a parametric curve rather than a point-based curve with linear interpolation. The parametric curve type allows the sRGB gamma function to be duplicated rather than approximated. Software support for v4 profiles is not great, so they aren’t used frequently, but a v4 profile with a parametric curve would serve as a good reference for testing my v2 profiles. That left me with a new problem, which was to find an optimal v4 profile.

Although the parametric curve type can duplicate the sRGB curve’s basic logic, the parameters themselves are defined in the ICC s15Fixed16Number format, meaning they have limited precision. I decided to evaluate the accuracy of a v4 sRGB curve using the same measures I used to evaluate my point-based curves in order to see how close it was to the true sRGB curve. Once again, I started with an example from Elle’s profile collection.

Here are the stats from that profile’s TRC compared with the best-performing point-based curves from Part 2.

Points | Max Error | Mean Error | RMS Error | Max DeltaL | Mean DeltaL | RMS DeltaL | Max RT Error
   182 |  0.001022 |   0.000092 |  0.000230 |   0.003107 |    0.000440 |   0.000736 | 0
   212 |  0.001650 |   0.000118 |  0.000357 |   0.002817 |    0.000449 |   0.000707 | 0
  1024 |  0.008405 |   0.000205 |  0.000996 |   0.003993 |    0.000475 |   0.000819 | 0
  4096 |  0.008405 |   0.000175 |  0.000860 |   0.003054 |    0.000472 |   0.000782 | 0
    v4 |  0.000177 |   0.000034 |  0.000051 |   0.000564 |    0.000317 |   0.000371 | 0

As you can see, the v4 parametric curve results in significantly less error than even the best point-based options. Its error, however, is still surprisingly high. Let’s take a look at the parameter values from that profile and see why that is.

Param | sRGB Value     | sRGB Decimal   | Profile Hex | Profile Decimal | Diff
    g | 2.4            | 2.4            |  0x00026666 |  2.399993896484 | -6.103516e-6
    a | 1.000/1.055    | 0.947867298578 |  0x0000f2a7 |  0.947860717773 | -6.580805e-6
    b | 0.055/1.055    | 0.052132701422 |  0x00000d59 |  0.052139282227 |  6.580805e-6
    c | 1.000/12.92    | 0.077399380805 |  0x000013d0 |  0.077392578125 | -6.802680e-6
    d | 0.04045        | 0.04045        |  0x00000a5b |  0.040451049805 |  1.049805e-6

Once quantized to s15Fixed16Number format, none of the numbers stored in the profile are exactly correct, and two of the parameters that have the largest impact on the output value (g and a) are both rounded down. Rounding both values in the same direction effectively combines their error. I decided to try ‘nudging’ all the parameter values to try to find a better fit than was produced by simple rounding. It turned out, the best fit I was able to achieve was by bumping the ‘g’ value up and leaving all the rest as they were. By using a ‘g’ value of 0x00026669, or 2.400039672852, I was able to cut the error to less than half that of the rounded values.

Points | Max Error | Mean Error | RMS Error | Max DeltaL | Mean DeltaL | RMS DeltaL | Max RT Error
    v4 |  0.000177 |   0.000034 |  0.000051 |   0.000564 |    0.000317 |   0.000371 | 0
   ^v4 |  0.000088 |   0.000012 |  0.000022 |   0.000240 |    0.000124 |   0.000143 | 0

While it’s not perfect, that is as close as it’s possible to get to the true sRGB inverse gamma function in an ICC profile. So with that and the primary colorant values from Part 3, I had my reference profile. I decided while I was making a reference profile, I may as well make it as small as I could so that I would have another compact profile option for embedding. That profile is here.

I also decided to create a reference v4 ICC profile for ProPhoto to use as my destination profile. That one was much simpler in that the default rounded values worked out to be the closest fit for the TRC, and the colorant values have a single, unambiguous definition. That profile is here.

The Reference CMS(s)

Once again, this turned out to be more complicated than I anticipated. One might expect that with the detail in the ICC specifications, there wouldn’t be much difference between CMS implementations. I’ve worked predominately with the Windows Color System (WCS) by way of the Windows Imaging Component (WIC), and I always assumed it did a reasonable job at color conversions. However, when looking for an easy way to test conversions using multiple profiles, I stumbled on the tifficc command-line utility from Little CMS.

In testing with tifficc, I found that the results were mostly in line with my expectations, but when using my candidate profiles as a target rather than a source, it appeared Little CMS was doing an upgrade or substitution of the specified profile to an internal reference version of the colorspace. That’s definitely a desirable behavior in that it ensures correct output regardless of minor profile differences, but it’s not desirable when trying to measure those minor differences. WCS, on the other hand, produced output different for each profile, and entirely different from Little CMS. And while that output was more in line with my expectations from my previous testing, it seems that it might not be as correct.

I had been planning for some time to replace some of my dependencies on WCS with my own color management implementation, but this has provided the final push I needed to motivate me to get it done. In the end, I decided to consider the output from both CMSs, so it would be easier to predict what might happen when using the profiles in different scenarios and with different software.

The Reference Scenario

This is where things finally get easy. The purpose of a compact profile is to be embedded in an image. Obviously, it would only be embedded in an image of a matching colorspace and would only be used as a source profile in those cases. I had already chosen ProPhoto as a destination colorspace because of its extreme difference from sRGB while still being a realistic conversion path. And having already decided to use a v4 ProPhoto profile as a reference destination, that left only one choice to make. I had already decided that I would test with an 8-bit reference input image because that’s the most common image type in the wild. But wide-gamut colorspaces like ProPhoto are not well-suited for use with 8-bit images. Squishing the sRGB gamut down into its corresponding place in the ProPhoto gamut at 8-bit resolution tends to cause posterization. So I decided to test 8-bit sRGB input and 16-bit ProPhoto output. I was also able to test the reverse of that transform, going from the 16-bit ProPhoto images back to 8-bit sRGB. In the interest of time, I won’t document the full details of those tests, but those tests are the ones that led to my conclusion that Little CMS does some kind of profile substitution and that WCS is probably Not Very Good. I may do another post on that at some point in the future.

For the Little CMS trials, I used a command-line similar to the following:

tifficc -v -t1 -isrgb-v4-ref.icc -oprophoto-v4-ref.icc -e -w rgb16million.tif rgb16milpp-ref.tif

For the WCS trials, I wrote a small utility that uses the WIC IWICColorTransform interface to perform the same conversion.

The Measurements

Having established a reference scenario and created a reference profile, all I had to do was run the conversion(s) in question using the reference profile as well as all the v2 candidates and then compare their output. I also figured it would be worthwhile to try some variants using more common values, like the 1024- and 4096-point TRCs and the ArgyllCMS and HP colorant values. That should allow a complete picture of how the candidate profiles perform as well as a good basis for understanding which parts of the profile contribute to greater differences in output.

Measuring the differences between the profile output mathematically is a simple task, but I wanted to be able to visualize those differences for easier comparison and so that it would be possible to see not only how much difference there was, but also where the differences occurred. I considered using ΔE-CIE2000 for these comparisons, but the reality is, the results are so close visually that there isn’t much meaning to the color difference. I also found the results of the raw differences interesting because of the way the visualization shows patterns in the error.

I’ve referenced the Beyond Compare image comparison tool a few times before because I like the way it works and the way it shows differences. The only problem with using it for these tests is that while it does load 16-bit images, it seems to convert them to 8-bit before doing the comparison. That means I couldn’t get the kind of detail I wanted to see in the diffs. Normally, when I use that tool, I set it up with a threshold of 1, meaning its default visualization will show pixels that are equal between two images in greyscale, pixels that are off by 1/255 in blue, and pixels that are off by more than 1/255 in red. In doing some trials with 8-bit output and comparing them in Beyond Compare, I found that none of the profiles in my test suite created output that differed from the reference by more than 1 on any given pixel. That’s good news, in that it backs up the theory that none of the profiles I’m testing will produce output that is significantly different visually. But it would make it difficult to draw any conclusions about which profiles are better, especially when the differences get more subtle. That issue, combined with the fact that ProPhoto isn’t recommended for 8-bit images anyway, led me to create my own variant of the image comparison tool that worked at higher bit-depth.

The second problem was visualizing the differences. As I said, I like the way Beyond Compare does it, but when you compare 16-bit images, it’s difficult to find a threshold for color-coding the differences. I ended up with something like the original but enhanced for the extra sample resolution. Instead of coloring different pixels either solid blue or solid red depending on the threshold, I created a gradient from blue to red. I chose the threshold rather arbitrarily, setting it at 65/65535, or roughly 0.1%. That threshold worked out well in that allowed me to create a gradient from blue to red for differences between 1 and 65, and then differences over 65 could be colored solid red. Note that the solid red doesn’t necessarily mean a difference would be distinguishable visually. And as you’ll see, differences that great were very rare in the tests anyway.

And finally, I added some stats to the diff images to provide a little more detail than can be seen visually. I grouped the errors into four buckets (1-17,18-33,34-49,50-65) and added raw pixel counts for each error bucket, plus the count of pixels over the 65 threshold. I also calculated the Max, Mean, and Root Mean Square error for each test image versus the reference. Those stats are written into the upper-left corner of each diff image.

The Results

From here on out, there will be a lot of images. These are the diff images created by the tool I described above. Again, grey pixels in the image indicate that the candidate profile produced output identical to the reference profile. Pixels that are tinted blue represent the smallest differences, and there is a gradient from blue to purple to red (more pink, really) for increasing error levels. Finally, any pixels colored solid red were different by more than 65/65535. All the images below are thumbnails, and you can click them to get the full diff image. Be aware, though, the diff images are 16.7megapixels in size, so don’t click them if you’re on a device that can’t handle the size (bytes or pixels). Oh, and the diff images themselves are 8-bit, even though they represent differences between 16-bit images. Since the diff is just a visualization, I wanted to keep them as small as possible. They’re already as much as 10MiB each saved as 8-bit-per-channel PNGs.

For each profile, I’ll include the results from both the LCMS tifficc utility and my WCS/WIC conversion utility. The differences are interesting to see.

The Colorant Factor

I’ll start with the effect of different primary colorant values used in common sRGB profiles. In Part 3 of this series, I looked at the differences between the odd colorant values from the HP/Microsoft sRGB profile as well as the Rec. 709-derived colorants used by the ArgyllCMS reference sRGB profile. For these tests, I created v4 ICC profiles using the same modified parametric curve from my reference profile, so that only the colorants are different. Converting the reference image to ProPhoto using those as source profiles, these are the diffs compared with output from my reference sRGB profile.

CMS	LCMS	Diff Counts
Colors	HP/MS	1-17	16.5M
TRC	v4 Ref	18-33	0
Max Diff	16	34-49	0
Mean Diff	5.4643	50-65	0
RMS Diff	5.8626	>65	0

CMS	LCMS	Diff Counts
Colors	Rec. 709	1-17	13.6M
TRC	v4 Ref	18-33	0
Max Diff	16	34-49	0
Mean Diff	1.7412	50-65	0
RMS Diff	2.2036	>65	0

And the same using WCS

CMS	WCS	Diff Counts
Colors	HP/MS	1-17	12.8M
TRC	v4 Ref	18-33	950745
Max Diff	63	34-49	17799
Mean Diff	6.3570	50-65	1955
RMS Diff	8.4811	>65	0

CMS	WCS	Diff Counts
Colors	Rec. 709	1-17	8.6M
TRC	v4 Ref	18-33	928751
Max Diff	63	34-49	17484
Mean Diff	4.7129	50-65	1955
RMS Diff	7.5958	>65	0

The thing that really stands out to me is the difference in the way these profiles are handled by LCMS and WCS. Based on the splotchiness of the WCS diff images (you’ll have to view them full-size to see), my guess is that it’s using lower-precision calculations than LCMS. In both cases, though, the differences are quite small and should be below the threshold of visible difference. That’s certainly the case with the Rec. 709 colors vs the sRGB reference colors, but the unbalanced colors from the HP/Microsoft profile don’t result in as much difference in the converted result as one might expect. I think the differences here also make a good reference point for determining the significance of the differences caused by different TRC approximations.

The 26-Point Curves

In Part 2 of this series, I did some detailed analysis of both the TinyRGB/c2 26-point approximated TRC and the proposed ‘improved’ curve used in the sRGBz profile. That analysis predicted that the sRGBz curve would perform less well than the TinyRGB curve, and it found another alternate 26-point curve that it predicted would do better. I figured some real-world testing of those predictions would be a good start. Although I measured and tuned the curves primarily using ΔL, which is a measure of visual difference, we can see that the results are the same even when measuring absolute pixel differences after conversion.

Note that in testing these curves, I created new profiles that all shared the same reference sRGB colorant values to limit any differences to the curves themselves.

It’s difficult to see in the thumbnails, but at full size, the visualization shows pronounced improvement in error levels between my alternate 26-point curve and either of the others. The sRGBz curve has both the largest mean error and the most individual pixels with high error levels.

CMS	LCMS	Diff Counts
Colors	sRGB Ref	1-17	13.6M
TRC	sRGBz	18-33	2.9M
Max Diff	67	34-49	243867
Mean Diff	13.6471	50-65	17624
RMS Diff	15.8336	>65	59

CMS	LCMS	Diff Counts
Colors	sRGB Ref	1-17	14.6M
TRC	TinyRGB/c2	18-33	2M
Max Diff	72	34-49	143994
Mean Diff	13.5533	50-65	13082
RMS Diff	15.1258	>65	1

CMS	LCMS	Diff Counts
Colors	sRGB Ref	1-17	14.7M
TRC	26-Point Alt	18-33	2M
Max Diff	64	34-49	111588
Mean Diff	13.4069	50-65	3636
RMS Diff	14.8904	>65	0

And again with WCS

CMS	WCS	Diff Counts
Colors	sRGB Ref	1-17	11.9M
TRC	sRGBz	18-33	4.4M
Max Diff	69	34-49	487042
Mean Diff	15.6611	50-65	51775
RMS Diff	18.7551	>65	335

CMS	WCS	Diff Counts
Colors	sRGB Ref	1-17	13.1M
TRC	TinyRGB/c2	18-33	3.4M
Max Diff	62	34-49	269412
Mean Diff	13.8746	50-65	13434
RMS Diff	16.2264	>65	0

CMS	WCS	Diff Counts
Colors	sRGB Ref	1-17	13.2M
TRC	26-Point Alt	18-33	3.3M
Max Diff	54	34-49	275720
Mean Diff	13.9112	50-65	5374
RMS Diff	16.2414	>65	0

Once again, the output from WCS seems to have amplified the error in the profiles, but the relative results are the same. The sRGBz curve is less accurate than TinyRGB’s, which is less accurate than my alternate 26-point curve. It’s also worth noting how much more error these curves contribute compared to the error from the alternate primary colorants. This level of error is still quite acceptable for the profiles’ primary intended use-case, but we’ll look at some other options.

The Alternate Compact Curves

I picked out a few of the interesting compact curves that my solver found in Part 2 of this series to see how they compare in terms of size/accuracy ratio. Here are those comparisons, again using both CMS’s. First LCMS…

CMS	LCMS	Diff Counts
Colors	sRGB Ref	1-17	8.0M
TRC	20-Point	18-33	7.4M
Max Diff	82	34-49	1.1M
Mean Diff	23.1414	50-65	212433
RMS Diff	25.6127	>65	18122

CMS	LCMS	Diff Counts
Colors	sRGB Ref	1-17	16.5M
TRC	32-Point	18-33	294858
Max Diff	48	34-49	12
Mean Diff	8.7495	50-65	0
RMS Diff	9.6480	>65	0

CMS	LCMS	Diff Counts
Colors	sRGB Ref	1-17	16.7M
TRC	42-Point	18-33	5485
Max Diff	32	34-49	0
Mean Diff	5.1398	50-65	0
RMS Diff	5.7177	>65	0

CMS	LCMS	Diff Counts
Colors	sRGB Ref	1-17	16.1M
TRC	63-Point	18-33	12
Max Diff	32	34-49	0
Mean Diff	2.4592	50-65	0
RMS Diff	2.8069	>65	0

And once more using WCS

CMS	WCS	Diff Counts
Colors	sRGB Ref	1-17	8.1M
TRC	20-Point	18-33	6.9M
Max Diff	82	34-49	1.5M
Mean Diff	23.0439	50-65	227729
RMS Diff	26.0615	>65	34926

CMS	WCS	Diff Counts
Colors	sRGB Ref	1-17	15.6M
TRC	32-Point	18-33	1.1M
Max Diff	38	34-49	1162
Mean Diff	9.6662	50-65	0
RMS Diff	11.4510	>65	0

CMS	WCS	Diff Counts
Colors	sRGB Ref	1-17	16.6M
TRC	42-Point	18-33	152587
Max Diff	25	34-49	0
Mean Diff	7.3962	50-65	0
RMS Diff	8.4588	>65	0

CMS	WCS	Diff Counts
Colors	sRGB Ref	1-17	16.7M
TRC	63-Point	18-33	187
Max Diff	23	34-49	0
Mean Diff	6.7250	50-65	0
RMS Diff	7.3211	>65	0

And now a few notes on these interesting curves…

Although the 20-point curve diff images look like a bit of a bloodbath, allow me to point out a couple of things. First, as I mentioned before, I chose the threshold for the full red pixels rather arbitrarily. I wanted the small differences between all the profile variants to be visible even in the thumbnails here, and I chose my thresholds based on my choice of a 64-value gradient for the smaller errors. Red doesn’t necessarily mean danger in this case; it just means the error is higher than the worst from the other curves. Not a lot worse, mind you, but the line has to be drawn somewhere, and I just happen to have drawn that line just under the max error of the 20-point curve. Second, you’ll note that most of the worst error is concentrated toward the upper left of the image and the upper left of each square within the image. These are the darker parts of the image, where a larger absolute pixel difference represents a smaller visual difference than it would at the mid-tones. The choice to concentrate the error in those areas less visible was a key part of the tuning algorithm used in my curve solver. I believe the 20-point curve is perfectly adequate for some 8-bit image embedding, particularly for thumbnail-sized images where file size is important. Weighing in at only 410 bytes, I believe this is the smallest possible usable sRGB-compatible profile.

The other three candidate curves performed very well indeed. The 32-point curve is quite a significant improvement over the 26-point curves used in the existing compact profiles and with a cost of only 12 additional bytes in the profile. So once again, I’ll say that 26 is not a magic number in this case. But really, if you’re looking for a magic number, wouldn’t you just skip straight to 42? The error level in the 42-point curve is quite good. It’s actually awfully close to the error caused by the bad colorant values used in the very popular HP/Microsoft sRGB profile, so it makes an excellent compromise if you’re looking to save space. The 63-point curve halved the error of the 42-point curve when using LCMS but didn’t do as much better with WCS, so while it would also be a good choice, I think 42 is the magic number for my compact profile.

The Big Curves

That just leaves us with the larger curves to evaluate. My solver identified curves of 182 and 212 points that appeared to be a closer fit to true sRGB than the standard 1024- and 4096-point curves used in many profiles. I wanted to see if that was true in a real-world test. Here are the results when using all four of those.

CMS	LCMS	Diff Counts
Colors	sRGB Ref	1-17	7.6M
TRC	182-Point	18-33	0
Max Diff	16	34-49	0
Mean Diff	0.7436	50-65	0
RMS Diff	1.2585	>65	0

CMS	LCMS	Diff Counts
Colors	sRGB Ref	1-17	8.2M
TRC	212-Point	18-33	0
Max Diff	16	34-49	0
Mean Diff	0.7736	50-65	0
RMS Diff	1.2533	>65	0

CMS	LCMS	Diff Counts
Colors	sRGB Ref	1-17	8.4M
TRC	1024-Point	18-33	0
Max Diff	16	34-49	0
Mean Diff	0.7879	50-65	0
RMS Diff	1.2471	>65	0

CMS	LCMS	Diff Counts
Colors	sRGB Ref	1-17	2.8M
TRC	4096-Point	18-33	0
Max Diff	16	34-49	0
Mean Diff	0.2164	50-65	0
RMS Diff	0.5570	>65	0

And repeated one last time using WCS

CMS	WCS	Diff Counts
Colors	sRGB Ref	1-17	16.6M
TRC	182-Point	18-33	1446
Max Diff	23	34-49	0
Mean Diff	5.7849	50-65	0
RMS Diff	6.2883	>65	0

CMS	WCS	Diff Counts
Colors	sRGB Ref	1-17	16.6M
TRC	212-Point	18-33	12261
Max Diff	23	34-49	0
Mean Diff	5.6512	50-65	0
RMS Diff	6.1638	>65	0

CMS	WCS	Diff Counts
Colors	sRGB Ref	1-17	16.6M
TRC	1024-Point	18-33	74
Max Diff	17	34-49	0
Mean Diff	5.6634	50-65	0
RMS Diff	6.1857	>65	0

CMS	WCS	Diff Counts
Colors	sRGB Ref	1-17	16.6M
TRC	4096-Point	18-33	144
Max Diff	17	34-49	0
Mean Diff	5.6006	50-65	0
RMS Diff	6.1383	>65	0

I must admit, these results have left me a bit puzzled. With LCMS, the 182- and 212-point curves generally outperformed the 1024-point curve, as I expected. But the 4096-point curve really blew the others away. In doing some round-trip testing early on with LCMS, I found that it was producing perfect output with the 1024- and 4096-point TRCs when it really shouldn’t have, so I suspected there may be some sort of internal substitution happening with those larger TRCs. I tested that theory out by modifying the first 20 points of the 1024-point profile to use the same large value. The output didn’t change, which lends some support to that theory. I didn’t dig into the LCMS code to see what’s happening, but I can say that the same substitution/upgrade did not occur when using my 182- or 212-point TRCs. So if you’re using LCMS and want the most accuracy (for destination profiles at least), you may be better off sticking with the standard TRCs. When used as a source profile, however, there is something special about those smaller curves. I think they’ll work nicely for embedded profiles when size is a factor.

The results when using WCS to convert were a bit more in line with my expectations. The 4096-point curve had more pixels with larger individual errors compared to the 1024-point curve, but it made up some of the gap in the average error by better fitting the top part of the curve. The 182- and 212-point TRCs performed admirably but didn’t offer an upgrade over the larger versions. Again, they have almost the same accuracy as the standard curves, so if size is a concern, they’re a viable option. I’ll go ahead and publish a profile that uses the 212-point curve because I think it has some value, but it’s not quite the upgrade over the standard curves I thought it might be.

The CMS Factor

I found the differences between the results with the two CMS’s interesting enough to do all the tests in both, but I wanted to show one last comparison to lend a bit of perspective to all the other comparisons/diffs in this post. Here’s what it looks like when you compare the reference outputs from each CMS with each other.

CMS	LCMS/WCS	Diff Counts
Colors	sRGB Ref	1-17	11.1M
TRC	v4 Ref	18-33	2.9M
Max Diff	404	34-49	1.0M
Mean Diff	22.7936	50-65	563030
RMS Diff	35.1353	>65	1.1M

Now that’s a bloodbath. And that serves to make an important point: the CMS implementation details can easily have a far greater impact on the output results than any of the profile aspects we’ve looked at. I’m not quite sure which is more accurate between LCMS and WCS, but I strongly suspect it’s the former. I’ll do some more testing on that as I do my own color conversion implementations in MagicScaler. If I find anything interesting and if I remember, I’ll come back and update this post.

PhotoSauce Blog

Musings on Imaging, Software, and anything else

Free as in…

Caveat Implementor

BLAKE2 in a Nutshell

The Reference

The Tests

The Good

The Bad

The Ugly

The Butfor

The Weird

The Others

The Conclusion

The Reference Image

The Reference Profile

The Reference CMS(s)

The Reference Scenario

The Measurements

The Results

The Colorant Factor

The 26-Point Curves

The Alternate Compact Curves

The Big Curves

The CMS Factor