Cloudinary Blog

Improve the Web Experience With Progressive Image Decoding

Progressive Image Decoding Delivers an Enhanced Web Experience

Progressive image decoding is an excellent way in which to accelerate page loads and hence improve the web-browsing experience. This post explains why and elaborates on the recent developments for that approach.

The Importance of Image Compression

Some people say that since internet speeds are continually trending faster, we don’t really need to enhance image compression. They believe that JPEG is good enough and that, in particular, progressive decoding belongs to the past, important for web surfing in the early 1990s with slow dial-in modems, which are no longer in use in the modern world.

I think those people are wrong. Yes, the internet is faster. However, not everyone has high-speed internet. Those who do—at home or at work—can't access it at all times, not while they're traveling. Separately, the faster internet has led to heavier websites, with the web becoming way more visual with ever more and larger images and videos. Images represent a large amount of data: every pixel consists of at least three numbers (R, G, and B), each number requiring at least 8 bits. So, without compression, 1 megapixel equals 3 megabytes. Given that the median webpage contains 2.1 MP worth of images, sending them, uncompressed, on a 3-Mbps, 3G connection would take at least 17 seconds—a long wait!

Plus, we desire high-resolution images and ones that require a wide color gamut and a high dynamic range, not achievable with 8-bit encoding. Bottom line: image compression remains a must-do.

In essence, lossless image compression is simple. The hard part is to find a more concise representation but, in the end, it stands for exactly the original pixel values that went in. For typical photographs, lossless compression accords you a compression ratio of 2:1 only, or maybe 3:1, which translates to 1 megapixel in 1 megabyte instead of 3. Not bad, but not good enough.

Remarkably, lossy compression can easily deliver ratios of 20:1 with no visible artifacts. In the ideal scenario, those artifacts are only numerical differences between the original and the decoded pixel values. Visually, unless you zoom in a lot, the images look the same, yet lossy compression brings a 1-megapixel image down to a much more manageable size of 150 KB.

Remember, you compress online images to improve the browsing experience. Data caps aside, file sizes matter because they determine how long users must wait to see your images. The smaller the files, the faster the images appear and the more pleasing the user experience.

Hence the promise of progressive decoding, which enables browsers to display image content before the files have finished loading.

Progressive Decoding

What’s progressive decoding? Clever image codecs organize compressed bits in such a way that even a partially—say, 10-percent—loaded image, can be decoded, resulting in a lower-quality (or lower-resolution) preview. The 30-year-old JPEG codec can do that, but that feature, optional and underused, is enabled by default in fancy JPEG encoders only, like mozjpeg.

Progressive decoding can improve the browsing experience by another order of magnitude: not only can it reduce a 3-MB, uncompressed image to 150 KB, it displays the image after downloading a mere 15 KB. To see the fine details, you must wait until the transfer is complete. However, if you’re just scrolling through the webpage, chances are that you’ll get an idea of the image from the preview. For the median webpage, lossy compression shortens the 17-second image-loading time to only one second, and progressive decoding can cause loading to proceed unnoticeably fast.

Image Versus Video

For video codecs, progressive decoding of a single frame is a waste of time. That’s because videos contain many frames, displayed in rapid succession, and you must buffer enough of the compressed video data before it makes sense to start playback.

Nonetheless, many new image codecs are derived from video codecs: WebP is basically a single-frame VP8 WebM video; HEIC is a single-frame HEVC video; and AVIF is a single-frame AV1 video. Because of their video origins, however, they don’t support progressive decoding. Too bad—even though those formats can reach higher compression densities, you must wait until all or most of the image data has loaded before you can see anything.

As a result, for all that AVIF’s superior compression capability could, for example, turn a 150-KB JPEG into a 75-KB AVIF, the first preview might paradoxically take four times longer to display. In other words, when 20 KB of the progressive JPEG image has loaded, a reasonably promising preview becomes available. For the AVIF, you must wait for the arrival and decoding of all 75 KB. Besides, the more complicated AVIF format takes longer to decode than the JPEG format.

Previews and Placeholders

To use nonprogressive codecs like WebP and AVIF but still generate a somewhat progressive browsing experience, leverage Low Quality Image Placeholders (LQIPs). In that case, you first serve a low-quality version of your images and then replace them with the actual ones with, for example, JavaScript.

The spectrum is wide, ranging from mere placeholders (really, really low-quality previews, e.g., a simple gradient of two predominant colors or a very blurry version of the image based on a dozen pixels only) to low-quality previews that can clue users in on the images, such as “quality 30” images as previews for the actual “quality 80” ones. In the case of AVIF and JPEG XL, you can embed LQIPs, saving the step of replacing the image externally.

The downside of separate previews or placeholders is that the total transfer size inevitably goes up. The enhanced browsing experience delivered by the preview deteriorates because it takes longer for the final image to arrive, and all the bytes necessitated by the preview or placeholder, which is separate and redundant, are, ultimately, wasted. The smaller the LQIPs, the lower their overhead—but also the less useful as a preview..

In contrast, progressive decoding does not waste bytes on separate previews: the first bytes of the actual high-quality image are the preview image. Talk about a welcome feature!

Improved Progressiveness

The state of the art of progressive images, which are as old as JPEG, has remained largely the same for 20 or 30 years. Excitingly, that’s starting to change.

First, the green martians I blogged about before—which can happen if the first luma and chroma information is not simultaneously available—are no longer an issue because browsers now wait until both chroma channels are available before showing a preview.

First program scan

Another recent improvement is in the upsampling techniques that show the first preview of a progressive JPEG, which is an image at 1:8 resolution. Basically, one pixel is available as the average color for every 8x8 block, also called the direct current (DC) coefficient. The simplest possible upsampling would yield a very blocky preview, for which you just fill all the 8x8 blocks with the DC value, as here:

Upsampling technique

Now reaching browsers is an improved upsampling technique, which creates a less artifacted, more appealing preview:

Improved upsampling technique

Those techniques are for progressive JPEGs. More enhancements are forthcoming for JPEG XL. An example is that you can progressively encode the DC itself in JPEG XL to more speedily generate the first preview. Normally, it takes 10 to 15 percent of the total file size to get the DC, which is the first full-image preview for a progressive JPEG. With progressive DC, a feature of JPEG XL, you can create a first LQIP when only one percent of the total file size has arrived.

JPEG XL offers two more options for advanced progressive encoding:

  • Middle-out scans: In JPEG, scans are always top to bottom. In JPEG XL, for which encoding occurs in groups of 256x256 pixels, you can reorder the groups. So, you can start each and every scan with the groups in the middle, which presumably contain the most enticing part of the image.
  • Saliency progression: Progressive scans of JPEGs must provide the same amount of new detail for every part of the image. Not so in the case of JPEG XL. That means you can progressively encode images based on saliency, such as by sending the faces or foreground objects in an image in more detail first, and the background later.

Largest Contentful Paint

Largest Contentful Paint (LCP) is a new user-experience metric Google will adopt to determine the ranking of search results. Even though discussion is still ongoing, a consensus has been reached to consider progressive rendering as an LCP factor.

In general, enhanced progressive rendering leads to perceived faster web performance and improved user experience. LCP will better capture those refinements, leading to higher Google-search rankings and stronger SEO.

The Expediency of JPEG XL

Unlike WebP, HEIC, and AVIF, JPEG and JPEG XL were designed for progressive decoding. The progressive capabilities of JPEG XL are superior to JPEG’s, however. Recall that reasonably appealing LQIPs become available with only a one-percent transfer of image data—and no need for separate and redundant LQIPs or preview images.

In summary, JPEG XL is a boon for the browsing experience, reducing bandwidth and displaying images faster and with higher fidelity. I’ll keep you posted on the format’s development.

My next article will discuss what it takes to create a codec to replace JPEG and why previous attempts failed. Stay tuned.

Recent Blog Posts

Our $2B Valuation

By
Blackstone Growth Invests in Cloudinary

When we started our journey in 2012, we were looking to improve our lives as developers by making it easier for us to handle the arduous tasks of handling images and videos in our code. That initial line of developer code has evolved into a full suite of media experience solutions driven by a mission that gradually revealed itself over the course of the past 10 years: help companies unleash the full potential of their media to create the most engaging visual experiences.

Read more
Direct-to-Consumer E-Commerce Requires Compelling Visual Experiences

When brands like you adopt a direct–to-consumer (DTC) e-commerce approach with no involvement of retailers or marketplaces, you gain direct and timely insight into evolving shopping behaviors. Accordingly, you can accommodate shoppers’ preferences by continually adjusting your product offering and interspersing the shopping journey with moments of excitement and intrigue. Opportunities abound for you to cultivate engaging customer relationships.

Read more
Automatically Translating Videos for an International Audience

No matter your business focus—public service, B2B integration, recruitment—multimedia, in particular video, is remarkably effective in communicating with the audience. Before, making video accessible to diverse viewers involved tasks galore, such as eliciting the service of production studios to manually dub, transcribe, and add subtitles. Those operations were costly and slow, especially for globally destined content.

Read more
Cloudinary Helps Minted Manage Its Image-Generation Pipeline at Scale

Shoppers return time and again to Minted’s global online community of independent artists and designers because they know they can count on unique, statement-making products of the highest quality there. Concurrently, the visual imagery on Minted.com must do justice to the designs into which the creators have poured their hearts and souls. For Minted’s VP of Engineering David Lien, “Because we are a premium brand, we need to ensure that every single one of our product images matches the selected configuration exactly. For example, if you pick an 18x24 art print on blue canvas, we will show that exact combination on the hero images in the PDF.”

Read more
Highlights on ImageCon 2021 and a Preview of ImageCon 2022

New year, same trend! Visual media will continue to play a monumental role in driving online conversions. To keep up with visual-experience trends and best practices, Cloudinary holds an annual conference called ImageCon, a one-of-a-kind event that helps attendees create the most engaging visual experiences possible.

Read more