Cloudinary Blog

Evolution of <img>: Gif without the GIF

Embed MP4 in HTML <img> Tags for Improved GIF-Like Experience

TL;DR

  • Gifs are awesome but terrible for quality and performance
  • Replacing Gifs with <video> is /better/ but has perf. drawbacks: not preloaded, uses range requests
  • Now you can <img src=".mp4">s in Safari Technology Preview
  • Early results show mp4s in <img> tags display 20x faster and decode 7x faster than the GIF equivalent - in addition to being 1/14th the file size!
  • Background CSS video & Responsive Video can now be a “thing”.
  • Finally - cinemagraphs without the downsides of Gifs! Now we wait for the other browsers to catch-up

Intro

I both Ode to Geocities love and Thanks Tim Kadlec hate animated Gifs.

Safari Tech Preview has changed all of this. Now I love and love animated “Gifs”.

Everybody loves animated Gifs!

Animated Gifs are a hack. To quote from the original Gif89a specification:

Note
The Graphics Interchange Format is not intended as a platform for animation, even though it can be done in a limited way.

But they have become an awesome tool for cinemagraphs, memes, and creative expression. All of this awesomeness, however, comes at a cost. Animated Gifs are terrible for web performance. They are HUGE in size, impact cellular data bills, require more CPU and memory, cause repaints, and are battery killers. Typically Gifs are 12x larger files than H.264 videos, and take 2x the energy to load and display in a browser. And we’re spending all of those resources on something that doesn’t even look very good – the GIF 256 color limitation often makes GIF files look terrible (although there are some cool workarounds).

My daughter loves them – but she doesn't understand why her battery is always dead.

Gifs have many advantages: they are requested immediately by the browser preloader, they play and loop automatically, and they are silent! Implicitly they are also shorter. Market research has shown that users have higher engagement with, and generally prefer both micro-form video (< 1minute) and cinemagraphs (stills with subtle movement), over longer-form videos and still images. Animated Gifs are great for user experience.

videos that are <30s have highest conversion

So how did I go from love/hating Gifs to love/loving “Gifs”?

In the latest Safari Tech Preview, thanks to some hard work by Jer Noble, we can now use MP4 files in <img> tags. The intended use case is not long-form video, but micro-form, muted, looping video – just like animated Gifs. Take a look for yourself:

Copy to clipboard
<img src=”rocky.mp4”>

Rocky!

Cool! This is going to be awesome on so many fronts – for business, for usability, and particularly for web performance!

... but we already have <video> tags?

As many have already pointed out, using the <video> tag is much better for performance than using animated Gifs. That’s why in 2014 Twitter famously added animated GIF support by not adding GIF support. Twitter instead transcodes Gifs to MP4s on-the-fly, and delivers them inside <video> tags. Since all browsers now support H.264, this was a very easy transition.

Copy to clipboard
<video autoplay loop muted inline>
  <source src="eye-of-the-tiger-video.webm" type="video/webm">
  <source src="eye-of-the-tiger-video.mp4" type="video/mp4">
  <img src="eye-of-the-tiger-fallback.gif" />
</video>

Transcoding animated Gifs to MP4 is fairly straightforward. You just need to run ffmpeg -i source.gif output.mp4

However, not everyone can overhaul their CMS and convert <img> to <video>. Even if you can, there are three problems with this method of delivering Gif-like, micro-form video:

  1. Browser performance is slow with <video>

    As Doug Sillars recently pointed out in a HTTP Archive post, there is huge visual presentation performance penalty when using the <video> tag.

    Sites without video load about 28 percent faster than sites with video

    Unlike <img> tags, browsers do not preload <video> content. Generally preloaders only preload JavaScript, CSS, and image resources because they are critical for the page layout. Since <video> content can be any length – from micro-form to long-form – <video> tags are skipped until the main thread is ready to parse their content. This delays the loading of <video> content by many hundreds of milliseconds.

    velocityfilmstrip

    For example, the hero video at the top of the Velocity conference page is only requested 5 full seconds into the page load. It’s the 27th requested resource and it isn’t even requested until after Start Render, after webfonts are loaded.

    Worse yet, many browsers assume that <video> tags contain long-form content. Instead of downloading the whole video file at once, which would waste your cell data plan in cases where you do not end up watching the whole video, the browser will first perform a 1-byte request to test if the server supports HTTP Range Requests. Then it will follow with multiple range requests in various chunk sizes to ensure that the video is adequately (but not over-) buffered. The consequence is multiple TCP round trips before the browser can even start to decode the content and significant delays before the user sees anything. On high-latency cellular connections, these round trips can set video loads back by hundreds or thousands of milliseconds.

    safari range requests

    And what performs even worse than the native <video> element? The typical JavaScript video player. Often, the easiest way to embed a video on a site is to use a hosted service like YouTube or Vimeo and avoid the complexities of video encoding, hosting, and UX. This is normally a great idea, but for micro-form video, or critical content like hero videos, it just adds to the delay because of the JavaScript players and supporting resources that these hosting services inject (css/js/jpg/woff). In addition to the <video> markup you are forcing the browser to download, evaluate, and execute the JavaScript player -- and only then can the video start to load.

    video player

    As many people know, I love my Loki jacket because of its built in mitts, balaclava, and a hood that is sized for helmets. But take a look at the Loki USA homepage – which uses a great hero-video, hosted on Vimeo:

    lokiusa.com filmstrip

    lokiusa.com video

    If you look closely, you can see that the JavaScript for the player is actually requested soon after DOM Complete. But it isn’t fully loaded and ready to start the video stream until much later.

    lokiusa.com waterfall

    Check out the WPT Results

  2. You can’t right click and save video

    Most long-form video content – vlogs, TV, movies – is delivered via JavaScript-based players. Usually these players provide users with a convenient “share now” link or bookmark tool, so they can come back to YouTube (or wherever) and find the video again. In contrast, micro-form content – like memes and cinemagraphs – usually doesn’t come via a player, and users expect to be able to download animated Gifs and send them to friends, like they can with any image on the web. That meme of the dancing cat was sooo funny – I have to share it with all my friends!

    If you use <video> tags to deliver micro-form video, users can't right-click, click-and-drag, or force touch, and save. And their dancing-cat joy becomes a frustrating UX surprise.

  3. Autoplay abuse

    Finally, using <video> tags and MP4s instead of <img> tags and GIFs is brings you into the middle of an ongoing cat and mouse game between browsers and unconscionable ad vendors, who abuse the <video autoplay> attribute in order to get the users’ attention. Historically, mobile browsers have ignored the autoplay attribute and/or refused to play videos inline, requiring them to go full screen. Over the last couple of years, Apple and Google have both relaxed their restrictions on inline, autoplay videos, allowing for Gif-like experiences with the <video> tag. But again, ad networks have abused this, causing further restrictions: if you want to autoplay <video> tags you need to mark the content with muted or remove the audio track all together.

... but we already have animated WebP! And animated PNG!

The GIF format isn’t the only animation-capable, still-image format. WebP and PNG have animation support, too. But, like GIF, they were not designed for animation and result in much larger files, compared to dedicated video codecs like H.264, H.265, VP9, and AV1.

Animated PNG is now widely supported across all browsers, and while it addresses the color pallete limitation of GIF, it is still an inefficient file format for compressing video.

Animated WebP is better, but compared to true video formats, it’s still problematic. Aside from not having a formal standard, animated WebP lacks chroma subsampling and wide-gamut support. Further, the ecosystem of support is fragmented. Not even all versions of Android, Chrome, and Opera support animated WebP – even though those browsers advertise support with the Accept: image/webp. You need Chrome 42, Opera 15+ or Android 5+.

webp

webm

apng

hevc

So while animated WebPs compress much better than animated GIFs or aPNGs, we can do better. (See file size comparisons below)

Having our cake and eating it too

By enabling true video formats (like MP4) to be included in <img> tags, Safari Technology Preview has fixed these performance and UX problems. Now, our micro-form videos can be small and efficient (like MP4s delivered via the <video> tag) and they can can be easily preloaded, autoplayed, and shared (like our old friend, the animated GIF).

Copy to clipboard
<img src="ottawa-river.mp4">

So how much faster is this going to be? Pull up the developer tools and see the difference in Safari Technology Preview and other browsers:

Take a look at this!

Unfortunately Safari doesn’t play nice with WebPageTest, and creating reliable benchmark tests is complicated. Likewise, Tech Preview’s usage is fairly low, so comparing performance with RUM tools is not yet practical.

We can, however, do two things. First, compare raw byte sizes, and second, use the Image.decode() promise to measure the device impact of different resources.

Byte Savings

First, the byte size savings. To compare this I took the trending top 100 animated GIFs from giphy.com and converted them into VP8, VP9, WebP, H.264, and H.265.

Note
NB: These results should be taken as directional only! Each codec could be tuned much more; as you can see the default VP9 encoding settings fair worse, here, than the default VP8 outputs. A more comprehensive study should be done that considers visual quality, measured by SSIM.

Below are the median (p50) results of the conversion:

Format Bytes p50 % change p50
GIF 1,713KB
WebP 310KB -81%
WebM/VP8 57KB -97%
WebM/VP9 66KB -96%
WebM/AV1 TBD
MP4/H.264 102KB -93%
MP4/H.265 43KB -97%

So, yes, an animated WebP will almost always be smaller than an animated GIF – but any video format will be much, much smaller. This shouldn’t surprise anyone since modern video codecs are highly optimized for online video streaming. H.265 fairs very well, and we should expect the upcoming AV1 to fair well, too.

The benefits here will not only be faster transit but also substantial data-plan cost savings for end users.

Net-net, using video in <img> tags is going to be far, far better for users on cellular connections.

Decode and Visual Performance Improvements

Next, let’s consider the impact that decoding and displaying micro-form videos has on the browsing experience. H.264 (and H.265) has the notable advantage of being hardware decoded instead of using the primary core for decode.

How can we measure this? Since browsers haven’t yet implemented the proposed hero image API, we can use Steve Souder’s User Timing and Custom Metric strategy as a good aproximation of when the image starts to display to the user. This strategy doesn’t measure frame rate, but it does tell us roughly when the first frame is displayed. Better yet, we can also use the newly adopted Image.decode() event promise to measure decode performance. In the test page below, I inject a unique GIF and MP4 in an <img> tag 100 times and compare the decode and paint performance.

Copy to clipboard
let image = new Image;
t_startReq = new Date().getTime();
document.getElementById("testimg").appendChild(image);
image.onload = timeOnLoad;
image.src = src;
return image.decode().then(() => { resolve(image); });

The results are quite impressive! Even on my powerful 2017 MacBook Pro, running the test locally, with no network throttling, I can see GIFs taking 20x longer than MP4s to draw the first frame (signaled by the onload event), and 7x longer to decode!

Local test on powerful MacBook Pro

Curious? Clone the repo and test for yourself. I will note that adding network conditions on the transit of the GIF v. MP4 will disproportionately skew the test results. Specifically: since decode can start happening before the last byte finishes, the delta between transfer, display and decode becomes much smaller. What this really tells us is that just the byte savings alone will substantially improve the user experience. However, factoring out the network as I’ve done on a localhost run, you can see that using video has substantial performance benefits for energy consumption as well.

How can you implement this?

So now that Safari Technology Preview supports this design pattern, how can you actually take advantage of it, without serving broken images to non-supporting browsers? Good news! It's relatively easy.

Option 1: Use Responsive Images

The simplest way is to use the <source type> attribute of the HTML5 <picture> tag.

Copy to clipboard
<picture>
  <source type=”video/mp4” srcset=”cats.mp4”>
  <source type=”image/webp” srcset=”cats.webp”>
  <img src=”cats.gif”>
</picture>

I’d like to say we can stop there. However, there is this nasty WebKit bug in Safari that causes the preloader to download the first <source> regardless of the MIME type declaration. The main DOM loader realizes the error and selects the correct one. However, the damage will be done. The preloader squanders its opportunity to download the image early and on top of that, starts downloading the wrong version, wasting bytes. The good news is that I’ve patched this bug and the patch should land in Safari TP 45.

In short, using the <picture> and <source type> for MIME type selection is not advisable until the next version of Safari reaches 90%+ of Safari’s total user base.

Option 2: Use MP4, animated WebP and Fallback to GIF

If you don't want to change your HTML markup, you can use HTTP to send MP4s to Safari with content negotiation. In order to do so, you must generate multiple copies of your cinemagraphs (just like before) and Varyresponses based on both the Accept and User-Agent headers.

This will get a bit cleaner once WebKit BUG 179178 is resolved and you can add a test for the Accept: video/* header, (the same way that you can test for Accept: image/webp, now). But the end result is that each browser gets the best format for <img>-based micro-form videos that it supports:

Browser Accept Header Response
Safari TP41+ H.264 MP4
Accept: video/mp4 H.264 MP4
Chrome 42+ Accept: image/webp aWebP
Opera 15 Accept: image/webp aWebP
Accept: image/apng aPNG
Default GIF

In nginx this would look something like:

Copy to clipboard
if ($http_user_agent ~* "Safari/605[.0-9]+$") {
   rewrite ^/(.*)$ https://www.domain2.com/$1 permanent;
}

map $http_user_agent $mp4_suffix {
    default   "";
    “~*Safari/605”  ".mp4";
  }

location ~* .(gif)$ {
      add_header Vary Accept;
      try_files $uri$mp4_suffix $uri =404;
    }

Of course, don't forget the Vary: Accept, User-Agent to tell coffee-shop proxies and your CDN to cache each response differently. In fact, you should probably mark the Cache-Control as private and use TLS to ensure that the less sophisticated ISP Performance-Enhancing-Proxies don't cache the content.

Copy to clipboard
GET /example.gif HTTP/1.1
Accept: image/png; video/*; */*
User-Agent: User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/605.1.13 (KHTML, like Gecko) Version/11.1 Safari/605.1.13

…

HTTP/1.1 200 OK
Content-Type: video/mp4
Content-Length: 22378567
Vary: Accept, User-Agent

Option 3: Use RESS and fall Back to

If you can manipulate your HTML, you can adopt the Responsive-Server-Side (RESS) technique. This option moves the browser detection logic into your HTML output.

For example, you could do it like this with PHP:

Copy to clipboard
<?php if(strlen(strstr($_SERVER['HTTP_USER_AGENT'],"Safari/605")) <= 0 ){ // if not firefox ?>
<img src=example.mp4>
<?php } else {?>
<img src=example.gif>
<?php }?>

As above, be sure to emit a Vary: User-Agent response to inform your CDN that there are different versions of your HTML to cache. Some CDNs automatically honor the Vary headers while others can support this with a simple update to the CDN configuration.

Bonus: Don’t forget to remove the audio track

Now, since you aren’t converting GIF to MP4s but rather you are converting MP4s to GIFs, we should also remember to strip the audio track for extra byte savings. (Please tell me you aren’t using GIFs as your originals. Right?!) Audio tracks add extra bytes that we can quickly strip off since we know that our videos will be played on mute anyway. The simplest way to do this with ffmpeg is:

Copy to clipboard
ffmpeg -i cats.mp4 -vcodec copy -an cats.mp4

Are there size limits?

As I’m writing this, Safari will blindly download whatever video you specify in the <img> tag, no matter how long it is. On the one hand, this is expected because it helps improve the performance of the browser. Yet, this can be deadly if you push down a 120-minute video to the user. I've tested multiple sizes and all were downloaded as long as the user hung around. So, be courteous to your users. If you want to push longer-form video content, use the <video> tag for better performance.

What's next? Responsive video and hero backgrounds

Now that we can deliver MP4s via <img> tags, doors are opening to many new use cases. Two that come to mind: responsive video, and background videos. Now that we can put MP4s in srcsets, vary our responses for them using Client Hints and Content-DPR, and art direct them with <picture media>, well – think of the possibilities!

Copy to clipboard
<img src="cat.mp4" alt="cat"
  srcset="cat-160.mp4 160w, cat-320.mp4 320w, cat-640.mp4 640w, cat-1280.mp4 1280w"
  sizes="(max-width: 480px) 100vw, (max-width: 900px) 33vw, 254px">

Video in CSS background-image: url(.mp4) works, too!

Copy to clipboard
<div style=”width:800px, height: 200px, background-image:url(colin.mp4)”/>

Conclusion

By enabling video content in <img> tags, Safari Technology Preview is paving the way for awesome GIF-like experiences, without the terrible performance and quality costs associated with GIF files. This functionality will be fantastic for users, developers, designers, and the web. Besides the enormous performance wins that this change enables, it opens up many new use cases that media and e-commerce businesses have been yearning to implement for years. Here’s hoping the other browsers will soon follow. Google? Microsoft? Mozilla? Samsung? Your move!

This was originally posted on Performance Calendar


Want to Learn More About Image Optimization?

Recent Blog Posts

Our $2B Valuation

By
Blackstone Growth Invests in Cloudinary

When we started our journey in 2012, we were looking to improve our lives as developers by making it easier for us to handle the arduous tasks of handling images and videos in our code. That initial line of developer code has evolved into a full suite of media experience solutions driven by a mission that gradually revealed itself over the course of the past 10 years: help companies unleash the full potential of their media to create the most engaging visual experiences.

Read more
Direct-to-Consumer E-Commerce Requires Compelling Visual Experiences

When brands like you adopt a direct–to-consumer (DTC) e-commerce approach with no involvement of retailers or marketplaces, you gain direct and timely insight into evolving shopping behaviors. Accordingly, you can accommodate shoppers’ preferences by continually adjusting your product offering and interspersing the shopping journey with moments of excitement and intrigue. Opportunities abound for you to cultivate engaging customer relationships.

Read more
Automatically Translating Videos for an International Audience

No matter your business focus—public service, B2B integration, recruitment—multimedia, in particular video, is remarkably effective in communicating with the audience. Before, making video accessible to diverse viewers involved tasks galore, such as eliciting the service of production studios to manually dub, transcribe, and add subtitles. Those operations were costly and slow, especially for globally destined content.

Read more
Cloudinary Helps Minted Manage Its Image-Generation Pipeline at Scale

Shoppers return time and again to Minted’s global online community of independent artists and designers because they know they can count on unique, statement-making products of the highest quality there. Concurrently, the visual imagery on Minted.com must do justice to the designs into which the creators have poured their hearts and souls. For Minted’s VP of Engineering David Lien, “Because we are a premium brand, we need to ensure that every single one of our product images matches the selected configuration exactly. For example, if you pick an 18x24 art print on blue canvas, we will show that exact combination on the hero images in the PDF.”

Read more
Highlights on ImageCon 2021 and a Preview of ImageCon 2022

New year, same trend! Visual media will continue to play a monumental role in driving online conversions. To keep up with visual-experience trends and best practices, Cloudinary holds an annual conference called ImageCon, a one-of-a-kind event that helps attendees create the most engaging visual experiences possible.

Read more