Cloudinary Blog

A one-color image is worth two thousand words

By Jul 27, 2016

Compressing a single-color image in various formats

In part one (One pixel is worth three thousand words) of this turned-to-be-two-part blog post, I discussed one-pixel images and how well different image formats “compress” these images. I was surprised how much there is to be said about the matter. This was supposed to be a short blog post, describing one-pixel images and how they compress, and instead it became a glorious monster (and also a two part blog post…).

Seems that my fellow image compression enthusiasts were as excited as I am about this subject! This blog post sparked a fascinating Reddit discussion, in which other formats like BMP and TIFF are considered, and a 141-byte one-pixel JPEG file was given (smaller than the 160-byte JPEG file that I thought was minimal).

With that in mind, I’m happy to introduce part two, where we’ll look at single-color images and examine them in length. I hope you find it interesting and would love to hear your thoughts and insights!

The most predictable image

As we saw in part one, single-pixel images are the worst-case scenario for image compression: they’re all headers and overhead, and very little data. You can’t really compress anything, since compression depends on predictability, and how are you supposed to predict one single pixel?

In this second part of the blog post, we go to the other extreme: extremely predictable images.

The most predictable image is a large rectangle in a single color. In other words, a scaled-up version of the one-pixel images I discussed in part one. An empty canvas, if you want. A blank sheet.

Compression algorithms should be really good at compressing an image where every pixel is the same color. It’s the best-case scenario, the ultimate in predictability – once you’ve seen the first pixel, you’ve seen them all.

Let’s take a single white pixel and scale it up to increasingly larger squares, and see how the various image formats perform. I wrote a little script to do just that. Here’s the result:

Since we’re looking at squares, the number of pixels is, well, the square of the number on the x-axis: as the width of the square goes from 1 to 5000 pixels, the size of the image goes from 1 pixel to 25 megapixels. So it’s not surprising that some of the curves look like quadratic functions. The uncompressed PBM format obviously has a file size that is (asymptotically) linear in the number of pixels (1 bit per pixel in this case). But JPEG and lossy WebP are also linear in the number of pixels (quadratic in the width of the square) – just with a better constant factor. In other words, they seem to have some inevitable cost per pixel. For JPEG, it looks like you need at least 2 bits per 8x8 macroblock.

This means that JPEG and lossy WebP have a theoretical limit on the compression ratio you can achieve with them: you can’t do better than 0.031 bits per pixel (for JPEG) or 0.014 bits per pixel (for lossy WebP).

Most of the other formats do not seem to have such a limitation. They do not look like a quadratic curve in the above plot. Let’s see what we have here.

GIF has a rather peculiar curve: it keeps going in a straight line for quite a while, but then it’s like it ‘shifts gears’ twice: first at some point just before 3000x3000, and then again just after 4000x4000. I don’t know the explanation for this behavior.
The PNG curve goes in a more or less straight line, with some ‘coughs’ and ‘jumps’ around powers of two (1024, 2048, 4096) which might be due to the changing behavior of the underlying zlib compression at such boundary points.
Lossy BPG and lossless BPG are quite close to one another – you can tell that these are just two different settings of the same compression algorithm, in contrast to WebP where the lossy and the lossless variant are completely unrelated algorithms. The weird thing with BPG is that the file size fluctuates up and down quite strongly as the image gets larger. It’s easier to see this if you zoom in on part of the plot:

For example, a 568x568 white square takes 232 bytes as a lossless BPG, while a slightly larger 569x569 white square takes only 144 bytes. Weird. * Lossless WebP has a nice low curve: a 25 megapixel white square (5000x5000) still only takes 1012 bytes. * FLIF is the king of this competition: it doesn’t matter what the image dimensions are, if all pixels are white, the FLIF file will be 19 bytes. This is a side-effect of a more general, quite simple compression technique FLIF uses. For each color channel (in this case there is only one: luma), the header can optionally contain a lower bound and an upper bound on the values that actually occur in the image. By reducing the range of values, better compression can be achieved. In this case, the range is reduced to one single value (white, or 255 if you express it as an 8-bit number). Once you know that this is the range, every pixel becomes as cheap as it can be: exactly zero bits.

In the plots above, it’s still a bit hard to see the asymptotic behavior of the various formats. Let’s plot the data in a different way: let’s look at the relative file size, i.e. the bits needed per pixel. Instead of looking at the number of bits per pixel, which would be very small, let’s look at the number of pixels per bit. The bang for the buck, so to speak. Here is a plot of the compressed white squares, with pixels per bit on the (logarithmic) Y-axis:

As expected, uncompressed PBM very quickly converges to 1 pixel per bit, as the overhead of the header and padding becomes less and less important. JPEG and lossy WebP also quite quickly converge to what appear to be their theoretical limits – 32 pixels per bit, and just above 70 pixels per bit, respectively.

GIF is more interesting: it keeps getting more and more pixels per bit, until it almost gets to 200 pixels per bit. It then seems to repeatedly bump its head against that ceiling.

PNG quite rapidly goes over that 200 ppb ceiling and keeps getting better, though above about 1000x1000, the improvement gets much slower and bumpier.

BPG and lossless WebP behave in a similar way: first they get better and better very rapidly, but then improvement gets slower and the pixels per bit curve becomes nearly horizontal. They can both achieve more than 1000 pixels per bit.

FLIF ‘cheated’ by not having to encode anything at all, so obviously it rapidly grows and keeps growing.

How far can we take this? Well, the largest image dimension that is supported by all of the above formats, is 16383x16383 pixels (268 megapixels). Let’s see what we get for this huge white square:

Format	Bytes	Pixels per bit
PBM	33,552,399	1.00
PNG	53,132	631.45
GIF	182,225	184.11
JPEG	1,048,737	31.99
Lossy WebP	477,334	70.29
Lossless WebP	10,368	3,235.95
Lossy BPG	16,734	2,004.90
Lossless BPG	18,032	1,860.60
FLIF	19	1,765,807

Would there be any difference if the huge square was black instead of white? Let’s look at the pixels-per-bit plot.

And here are the numbers for a huge 268-megapixel black square:

Format	Bytes	Pixels per bit
PBM	33,552,399	1.00
PNG	32,645	1,027.73
GIF	182,225	184.11
JPEG	1,048,737	31.99
Lossy WebP	477,334	70.29
Lossless WebP	10,366	3,236.57
Lossy BPG	16,734	2,004.90
Lossless BPG	18,032	1,860.60
FLIF	15	2,236,689

It turns out that for some formats, there actually is a difference between black and white squares, and black is always more compact than white. In PNG, it makes quite a big difference: a huge white square is 53 KB, while a huge black square is only 32 KB. Also, the pixels-per-bit curve looks much smoother for black squares than for white squares.

You might wonder what causes this difference between white and black in PNG. The explanation is that the image data in a PNG is basically PNM+zlib, with one important difference: a “filter” can be applied to make the data compress better. PNG supports five filter types (None, Sub, Up, Average and Paeth), and each image row can have a different filter. Every row gets prefixed with one extra byte which indicates the filter type for that row.

Now what is the best way to encode a fully white image in PNG? Well, it does not really matter much which filter type you use – whichever you pick, the actual filtered data bytes will either be mostly 0 or mostly 255. The thing is, if you pick filter type None (encoded as 0), then the filtered data is all 255, while if you pick any other filter type (encoded as 1 to 4), then the filtered data becomes mostly all zeroes. In any case, you don’t get a big sequence of identical bytes – there is an interruption at the beginning of each row. You get a repetitive pattern, with a length that depends on the width of the image. This explains the jumps around widths that are a power of two: those are the points where zlib needs more bits to represent the LZ77 distances.

A fully black image is easier: you can just pick filter type 0. The filtered data is all zeroes whatever you do, so the data to be compressed is one big sequence of only zeroes. And that is pretty much the best-case scenario for zlib, so you get pretty close to zlib’s theoretical limit, which is a 1032:1 compression ratio.

A fully transparent image gets more or less the same compression as a fully black image, for the same reason: it’s one big sequence of zeroes (just a longer sequence because there’s the extra alpha channel).

If you examine the 32 KB huge black square PNG file, you’ll notice that even though it is already compressed over a thousand times, the resulting compressed data is still very repetitive. In other words, the compressed data can be compressed even further. Indeed: running gzip on this 32 KB PNG file produces a .gz file of just 225 bytes.

Wow. This was a pretty long and technical blogpost, and yet we’ve only covered the simplest possible kind of images: first one single pixel, and then the same single pixel scaled-up to larger and larger squares. While these are certainly not the most useful, nor the most representative types of images, they can still tell us something about the limits of the image formats which we have considered: PNG, GIF, JPEG, uncompressed PNM/PAM, lossy and lossless WebP and BPG, and FLIF. We’ve seen the worst-case and the best-case behavior of these image formats: from 160 bytes for a single pixel, all the way down to 15 bytes for 268 megapixels.

If you made it this far, congratulations! I hope you learned something new while reading all of this. I sure did while writing it!

Recent Blog Posts

Supercharge Shopify Conversions and Revenue with Cloudinary

By Aug 19, 2022

Our $2B Valuation

By Feb 17, 2022

When we started our journey in 2012, we were looking to improve our lives as developers by making it easier for us to handle the arduous tasks of handling images and videos in our code. That initial line of developer code has evolved into a full suite of media experience solutions driven by a mission that gradually revealed itself over the course of the past 10 years: help companies unleash the full potential of their media to create the most engaging visual experiences.

Compelling Visual Experiences Are Imperative for DTC E-Commerce

By Pradip Lal Feb 02, 2022

Direct-to-Consumer E-Commerce Requires Compelling Visual Experiences

When brands like you adopt a direct–to-consumer (DTC) e-commerce approach with no involvement of retailers or marketplaces, you gain direct and timely insight into evolving shopping behaviors. Accordingly, you can accommodate shoppers’ preferences by continually adjusting your product offering and interspersing the shopping journey with moments of excitement and intrigue. Opportunities abound for you to cultivate engaging customer relationships.

Automatically Translating Videos for International Outreach

By Jan 26, 2022

Automatically Translating Videos for an International Audience

No matter your business focus—public service, B2B integration, recruitment—multimedia, in particular video, is remarkably effective in communicating with the audience. Before, making video accessible to diverse viewers involved tasks galore, such as eliciting the service of production studios to manually dub, transcribe, and add subtitles. Those operations were costly and slow, especially for globally destined content.

Minted Delivers High-quality Art at Scale with Cloudinary

By Jan 25, 2022

Cloudinary Helps Minted Manage Its Image-Generation Pipeline at Scale

Shoppers return time and again to Minted’s global online community of independent artists and designers because they know they can count on unique, statement-making products of the highest quality there. Concurrently, the visual imagery on Minted.com must do justice to the designs into which the creators have poured their hearts and souls. For Minted’s VP of Engineering David Lien, “Because we are a premium brand, we need to ensure that every single one of our product images matches the selected configuration exactly. For example, if you pick an 18x24 art print on blue canvas, we will show that exact combination on the hero images in the PDF.”

ImageCon 2021 by the Numbers: Fun Facts and Event Highlights

By Jan 24, 2022

Highlights on ImageCon 2021 and a Preview of ImageCon 2022

New year, same trend! Visual media will continue to play a monumental role in driving online conversions. To keep up with visual-experience trends and best practices, Cloudinary holds an annual conference called ImageCon, a one-of-a-kind event that helps attendees create the most engaging visual experiences possible.