Lies, Damned Lies, and Benchmarks Part 3: Varying Variables

May 13. 2016 0 Comments

This is the final part of my review of the FastScaling plugin for ImageResizer. Over the first two parts of this series, we examined some of the performance claims made by the FastScaling documentation. To review, those claims could be grouped into three categories:

It claimed that its orthogonal processing was more efficient than DrawImage()’s ‘general distortion filter’. That was true, but other architectural deficiencies cancel out that benefit in many cases. We saw that at equivalent output quality and on a single thread, FastScaling doesn’t offer much, if any, improvement over optimized DrawImage() usage. Its native RGB processing is more efficient, but even with that advantage, it barely eked out a win in our JPEG test. With other container formats, results may vary. With other pixel formats, it does significantly worse than GDI+.
It claimed to break free of the single-thread limit imposed by DrawImage(), allowing it to scale up with more more processors/cores. That was also true. But we saw the cost of that is that it’s allowed to run away with your server resources. Memory is particularly hard-hit since FastScaling seems to require even memory per image than DrawImage() does.
It claimed performance improvements through dynamic adjustment of scaler settings and through what they call ‘averaging optimizations’. We have not yet explored these.

Point 2 above could easily be the end of the road for this series. It’s a deal-breaker for true scalability. I certainly wouldn’t let FastScaling anywhere near any of my servers. But I’m still curious about that last point. I do some dynamic adjustments of quality settings in MagicScaler as well, and I’m interested to see how they compare.

I’m also curious as to how they arrived at such impressive numbers in their benchmarks. Nothing I’ve seen indicates FastScaling is anywhere near as fast as they say, but I’d like to see if I can get close to some of those numbers or at least explain how they got them. I came up with my own baseline for my own tests, but I might need to reset that baseline if I’m going to match theirs.

Narrowing the Scope

Beyond the baseline problem, there’s a problem of variables. I showed how limiting benchmarks to a single variable at a time makes them much more educational. Likewise, carefully choosing those variables can allow you present a distorted view of reality. I’d like to see if I can determine how they arrived at theirs, and why. Right off the bat, there are several to consider, such as:

Input image size, container format and pixel format
Output image size, container format and pixel format
Interpolation method and parameters (this can be extremely complex and variable itself)
Shortcuts, such as in-decoder transformations, or intermediate processing

JPEG input and output are clearly the most representative of a real-world web workload, so that part is a no-brainer. As for the input image size, I mentioned before that a larger input image exaggerates the performance difference in the scalers. I used a 24MP image for my initial tests, but the 16MP input used in the FastScaling benchmarks is also reasonable for those purposes. I’ll go ahead and switch to that size now. We’re also going to be doing only RGB (YCbCr, actually) JPEGs since they’re most typical.

The image I chose for this round of tests comes from the USGS flickr. The original file had an embedded Nikon ICC profile, which adds considerable processing time to the decode/conversion step. This would make things particularly unfair when using MagicScaler’s ability to resize within the decoder, so in order to keep the workload as similar as possible for all the scalers, I converted the image to the sRGB colorspace in Photoshop and re-saved it without an embedded profile for these benchmarks. The converted file is here

So the first real decision we have to make is output size. It has to be something realistic for web scenarios, but beyond that, it doesn’t seem like all that important a choice. I chose an output width of 400px for my earlier tests simply because I find that size easy to manage. I can do screenshots of my test app without them being too big, and I can easily take in all of the images in a single glance so differences in visual results are easy to spot. The FastScaling benchmarks used 800px output, and I wondered whether there was a reason for that. If you saw my earlier benchmarks between ImageResizer’s GDI+ implementation and my own reference GDI+ resizer, you may remember that at larger output sizes, the sub-optimal GDI+ settings used by ImageResizer made it significantly slower. I wondered if that handicap would make FastScaling look better by comparison, so I ran a few tests using my baseline settings from Part 1 of this series. The idea here is to keep them on even ground and change only the output size variable for now.

fsbaseline16mp

What’s interesting here is that the two scalers in ImageResizer follow a completely different trajectory than the reference GDI+ resizer and MagicScaler. ImageResizer is clearly paying a performance penalty at larger output sizes, but that penalty is paid by both of its scalers. There doesn’t appear to be any special reason they chose the 800px output size. In fact, at that size, FastScaling is actually slower than the reference GDI+ resizer. It is noteworthy that FastScaling beats ImageResizer’s GDI+ implementation at all output sizes, but the margin is modest, at a relatively constant 40-50ms. By comparison, MagicScaler maintains a steady 120-130ms advantage over the reference GDI+ resizer.

With these results in mind, I don’t think it’s at all unfair to stick with my preferred 400px output width for the remaining benchmarks. FastScaling actually holds a slight edge over the reference GDI+ resizer at that size, and we’ll have an easier time comparing output quality once we start enabling some of the processing shortcuts that FastScaling and MagicScaler support. This is the new baseline I’ll be using going forward.

fsbaseline16mpjpeg

Speaking of Quality…

Before I start sacrificing quality for speed in these comparisons, there’s one last topic I want to visit from the FastScaling documentation. Beyond the performance claims made in the docs, they also claim to have improved quality over DrawImage().

Another failing of DrawImage is that it only averages pixels in the sRGB color space. sRGB is a perceptual color space, meaning that fewer numbers are assigned to bright colors; most are assigned to shades of black. When downscaling (weighted averaging), this tends to exaggerate shadows and make highlights disappear, although it is just fine when upscaling.
FastScaling defaults to working in the srgb color space too - but only because users expect DrawImage-like behavior, not because sRGB is better. Linear light is almost always a better choice for downscaling than sRGB, and is more 'correct'.

These statements about processing light values in the sRBG compressed domain are true. It’s a bit of an oversimplification, but Eric Brasseur has written an excellent piece on the topic if you want more detailed info. I was interested by the statement in the second paragraph that FastScaling chooses sRGB processing as a default only because that’s what people expect, especially in light of all the performance claims made. Processing in linear light is better, but it’s always more expensive, and I wonder just what kind of performance hit FastScaling takes to do it. We saw in the last test, FastScaling barely beat the reference GDI+ resizer at 400px output from a 16MP JPEG source. Let’s do that same test again but enable linear light processing in FastScaling this time. Oh, and in MagicScaler too, because of course it supports linear processing as well…

fslinearjpeg

As you might have guessed, FastScaling gave up its meager lead with the added processing. It’s now over 200ms slower than the GDI+ reference, while MagicScaler is still almost 100ms faster. The difference in quality is quite subtle in this image, but it can be more pronounced in images with small high-contrast areas. Here’s a better example using an untouched 17MP image of the Milky Way, also from the USGS flickr.

fslinearjpeg2

WIC looks worst (as usual) here, but both FastScaling and MagicScaler look worlds better than the best GDI+ can do with this image. And with roughly the same input image size, performance is about the same as the previous test after accounting for the increase in decoding time. FastScaling is ~200ms slower than GDI+, and MagicScaler is ~100ms faster. So while FastScaling is sometimes better or faster than GDI+, it’s most certainly not both.

I feel the need, the need for speed

Ok, with that last quality issue addressed and with a good baseline established, we can start to play with some of the options that sacrifice quality for processing speed. GDI+ is obviously going to be quite limited in this regard, as we can really only change the interpolation mode to get better performance. However, as I suggested in Part 1 of this series, the ‘averaging optimizations’ mentioned in the FastScaling docs are also possible to implement with DrawImage(). I call it Hybrid Scaling in MagicScaler, so I’ll use that term from now on.

The reason it’s possible to do such a thing with DrawImage() is because we happen to know (from my earlier analysis of the GDI+ interpolation modes) that the default Linear interpolation mode from GDI+ adapts toward a Box (or averaging) filter at higher downscale ratios. We also saw in my earlier testing with GDI+ that the Linear interpolation mode doesn’t require a conversion to RGBA to do its work and doesn’t require the entire source image to be decoded into memory all at once. That makes this technique particularly interesting in GDI+, because we can reduce memory usage while at the same time increasing speed. I went ahead implemented hybrid scaling in my reference GDI+ resizer (it took all of about 10 minutes), so we can see what GDI+ can do under the best of conditions. We’ll compare that with the best speeds FastScaling and MagicScaler can achieve. We’ve already seen that in terms of raw speed and efficiency, WIC is going to be impossible to beat, but there isn’t really anything we can do to make it faster or slower, so I’ll drop it from my benchmarks at this point. The best it did on my reference image was 54ms. We’ll keep that number in mind.

The FastScaling docs are light on details regarding its speed vs quality tradeoffs, but it appears they’re all driven with the down.speed setting. MagicScaler allows control of its quality tradeoffs with its HybridMode setting, which has 4 options. The first option is ‘Off’, which is what we’ve done so far. The other 3 modes (FavorQuality, FavorSpeed, and Turbo) allow MagicScaler to resize the source by powers of 2 using either the decoder or the WIC Fant scaler (which is somewhere between a Linear and Box filter) before finishing with its own high-quality scaler. The 3 options control how far the low-quality resizing is allowed to go.

FavorQuality allows low-quality scaling to the nearest power of 2 at least 3x the target size.
FavorSpeed allows low-quality scaling to the nearest power of 2 at least 2x the target size.
Turbo allows low-quality scaling to the nearest power of 2 to the target size. When resizing by a power of 2 up to 8:1, it is equivalent to the WIC scaler I’ve benchmarked so far.

The Hybrid mode I added to my reference GDI+ resizer follows the same rules but uses the GDI+ Linear scaler to do its low-quality phase. From this point on, I’ll have to abandon the idea that we can reach equivalent output, so we’ll be stuck with more subjective comparisons for quality. And Away we go…

fsspeed016mp

Quality looks to be pretty even at this point. The hybrid scaling version of my GDI+ resizer knocked 115ms off the normal GDI+ time, but FastScaling and MagicScaler both did better. Note that I’m moving the FastScaling down.speed setting up by 2 at a time since it has a total of 7 levels to MagicScaler’s 4. I’ve also left the down.window=4 setting in place for the FastScaling tests since I believe that setting’s default value was a bug. I’ll allow it to use its default value when we test the maximal speed of each component. And finally, note that MagicScaler is using the JPEG decoder to perform part of the scaling, so its speed is approaching that of the WIC scaler already. Next level up…

fsspeed216mp

Looks like nothing really changed here. MagicScaler’s logic used an intermediate ratio of 4:1 on both this test and the last, so the work done was the same. It appears FastScaling might have also used the same settings for both of these runs. And now the fastest settings:

fsspeed416mp

With this setting, MagicScaler is using an 8:1 intermediate ratio, and the speed is within 2ms of the pure-WIC times we saw earlier. The image is noticeably more blurry now, but doesn’t seem to be as bad off as the FastScaling one. No matter, though, FastScaling barely beats out the hybrid version of the GDI+ resizer in single-thread performance. But that’s probably not the best FastScaling can do performance-wise. I’ll do one final test, changing its down.filter value to ‘Fastest’ and removing its down.window setting, while leaving the down.speed=4 setting. As far as I can tell from the docs, this should be its best speed.

fsspeed4fastest16mp

That shaved a few milliseconds off the FastScaling number, but it’s probably within the margin of error. Its visual quality is by far the worst now.

You may notice I changed other one thing in this test while I was at it. Since I had already maxed out MagicScaler’s speed, in this test I enabled its automatic sharpening. You can see here that it added only 2ms to the processing time, but the results are quite striking. MagicScaler is showing nearly 3x the speed of FastScaling and better quality to boot. In fact, the MagicScaler result looks better than GDI+ at 5x the single-threaded performance or 25x the performance on 8 threads.

As for FastScaling’s numbers vs GDI+, the biggest number we’re showing here is 8.3x faster than GDI+ when running on 8 threads. That’s actually within the 4.5-9.6x end-to-end speed range quoted in the FastScaling benchmarks. The problem is, those numbers are with its lowest quality settings, which are unacceptably poor. And it used over 400MiB of RAM during the test, which is unacceptably poor for scalability. The hybrid scaling in my GDI+ reference dropped its memory use to 13MiB from the baseline version’s 64MiB, by the way, and its single-threaded performance numbers were very close to FastScaling’s best while producing better quality.

fsdirty2

I think I’ve proven my point. FastScaling’s performance claims are way overblown, and MagicScaler is in a completely different league.

Oh, and there’s one more thing:

This plugin (FastScaling plugin) is part of the Performance Edition
The Performance edition costs $249 per domain

Ha! Did I mention MagicScaler is free?

PhotoSauce Blog

Musings on Imaging, Software, and anything else

Lies, Damned Lies, and Benchmarks Part 3: Varying Variables

Narrowing the Scope

Speaking of Quality…

I feel the need, the need for speed