Cloudinary Blog

Audio in Video Is Crucial. Here's How to Produce High-Quality Audio

Why Audio in Video Matters

Many content creators and consumers tend to regard video as visuals, but that’s only part of the experience. Immersive video content includes strong audio. Just like in a movie, the audio for video content comprises many components: the narrator or subjects, the background music that sets the mood and draws viewers in, sound effects, and so forth.

It’s easy to overlook audio in deference to the visuals. However, high-quality audio counts as much in short videos as it does in long productions. Let’s dig into how poor audio impacts otherwise compelling video and explore how Cloudinary helps fix the issues for a more engaging viewing experience.

Understanding the Production Problems

Audio problems are annoying. For example, with multiple clips produced at different times or by different people, the creator might neglect to level the sound, causing sound variation in compound videos.

Imagine this scenario: A viewer is a third of the way through watching a video, and suddenly the narrator’s voice turns twice louder. Or, worse, the next piece of background music jumps up a level and drowns out everything else. Such an abrupt volume change interrupts attention, makes the video feel less immersive, and might even cause the viewer to stop watching. On a streaming site like YouTube or Vimeo, you as the producer might lose views—or even receive thumbs-down ratings or nasty comments.

Other problems can result in annoying audio. Parts of the video might go quiet or become almost inaudible. Or the music is scratchy from poor quality or a low recorded bitrate.

However, a video with high audio quality might not be ideal. Why? Because you might’ve leveled all the audio but produced it at too high a quality level, which might crash machines that don’t support the audio codec. Additionally, the video quality might decline if a device’s connection or processor cannot handle the file.

Accessibility of audio matters just as much as quality and leveling since some of your audience might be deaf or hard of hearing, or they might speak a different language. Subtitles or other visual cues would be of tremendous help for them.

Producing optimal audio is challenging. Even experienced creators occasionally overlook certain details or run into obstacles.

Working With Tools

Tools can help solve audio issues. A studio-quality microphone, such as a Blue Yeti model, which is relatively inexpensive and which offers moderate-level recording control, would make a huge difference.

In addition, with premium-quality studio headphones, you can listen to the video during the production process and identify problems. Budget allowing, whole devices dedicated to audio processing and sound control are available, not to mention first-rate computers, equipment, or devices for audio production.

On the other hand, budget constraints might preclude those hardware purchases, especially at the outset. Software is a far more economical alternative, and cloud production takes the load off your machines—especially if you work with only one computer.

As one of the longest-existing streaming sites, YouTube offers rudimentary tools—but not a full suite—for video editing. Also, the tools for various production software vary. Some focus on video; others on audio, but many merely control the basic audio functions postproduction. Other postproduction tools would come in handy to beat your audio into shape.

Leveraging Cloudinary’s Postproduction Assistance

Cloudinary offers controls for both video and audio. While editing a video with Cloudinary, you can upload the audio files separately and work with several other tools with transformation capabilities similar to those in photo-editing software: clip, stretch, and so on. Even if encoded, those tools work directly on audio and video.

Plus, by uploading and hosting videos with Cloudinary, you can apply transformations through APIs, which support services of all kinds. Cloudinary even comes with a video player.

The next section describes a few transformations as examples. Feel free to use some of Cloudinary’s example videos or upload your own audio and video. Before you start, sign up for a free Cloudinary account.

Transforming Audio

video upload

Here’s a demo of a simple transformation of a video from the Cloudinary Media Library. Follow these steps:

  1. Double-click recipes and choose one of the four video options. After loading the video, click Transform to go to the video’s Transform page, where you can resize, crop, format, and edit videos on the fly. You can also add special effects.

  2. Scroll down to Audio Codec and click No Audio to remove the audio from a video in order to overlay another version. A Refresh button is then deployed on the demo player.

    video transforming

  3. Click Refresh to preview the change.

    The code line below the player will have changed, and you can now download the edited video or post it as is on a website. If you’re using JavaScript or another framework or language, you can derive code to generate a player for it. See this example with React:

    Copy to clipboard
    <Video publicId="recipes/asltranslation" >
    <Transformation audioCodec="none" />

Other controls are also available, e.g., you can shift the audio frequency or change the codec to other formats that perform better on other systems. (As mentioned earlier, too good a sample or an unknown codec might cause crashes.) Besides, you can chain transitions for multiple edits.

Diving Deeper Into the Flow

To correct or edit audio directly, use Cloudinary’s MediaFlows system with which you can custom-build a video editor with a block-type programming interface and different features per block.


MediaFlows is in Beta, requiring a separate login after registration on Google or GitHub.

For sound enhancements, Cloudinary has worked with to build the Media Enhancement block. To enable that block, contact Cloudinary Support. Also, given that the block’s features are advanced, they require an additional API key.

Afterwards, you can use to transform the videos within your MediaFlows app. A new block is displayed, in which you can edit the volume, reduce the noise level, isolate a speech, or apply speech-leveling effects to fine-tune the video’s audio quality.

Try This MediaFlow Today!
Ready to try MediaFlows for yourself? Check out “Enhancing Audio for Video using Media Enhance API”.

Capitalizing on a Cloudinary Add-On

While working with an app, Cloudinary’s add-ons render your videos accessible. For instance, if you’ve built a custom uploader with Cloudinary, you can leverage a transcription tool through the Google AI Video Transcription Add-On by calling that tool through code with the Cloudinary API, just as you do with video transformations.

A case in point: When uploading a video through your app, you can chain the video to the Cloudinary API with the call below, which is programmed for the Node.js API, or with others that are slated for various languages or frameworks.

Copy to clipboard
  { resource_type: "video",
    raw_convert: "google_speech" },
  function(error, result) {console.log(result, error) });

Cloudinary and the Video Transcription tool transcribe the video in the language you specify. You can then turn the transcription into captions and configure Cloudinary to link to other add-ons for a more accessible video for wider audiences.

Wrapping Up the Track

Because the quality and content of audio can enhance or destroy video, it’s just as crucial as video. A critical task is to ensure that your audio timing is on track.

Even though you can fix most audio problems with the correct tools, you need more help at times, especially if you’re working as a single developer. Give Cloudinary a try to see (and hear) how it can help you attain the video feel you aim for and reach wider audiences. Cloudinary also works with other services, boosting the range of features for managing audio.

Recent Blog Posts

Our $2B Valuation

Blackstone Growth Invests in Cloudinary

When we started our journey in 2012, we were looking to improve our lives as developers by making it easier for us to handle the arduous tasks of handling images and videos in our code. That initial line of developer code has evolved into a full suite of media experience solutions driven by a mission that gradually revealed itself over the course of the past 10 years: help companies unleash the full potential of their media to create the most engaging visual experiences.

Read more
Direct-to-Consumer E-Commerce Requires Compelling Visual Experiences

When brands like you adopt a direct–to-consumer (DTC) e-commerce approach with no involvement of retailers or marketplaces, you gain direct and timely insight into evolving shopping behaviors. Accordingly, you can accommodate shoppers’ preferences by continually adjusting your product offering and interspersing the shopping journey with moments of excitement and intrigue. Opportunities abound for you to cultivate engaging customer relationships.

Read more
Automatically Translating Videos for an International Audience

No matter your business focus—public service, B2B integration, recruitment—multimedia, in particular video, is remarkably effective in communicating with the audience. Before, making video accessible to diverse viewers involved tasks galore, such as eliciting the service of production studios to manually dub, transcribe, and add subtitles. Those operations were costly and slow, especially for globally destined content.

Read more
Cloudinary Helps Minted Manage Its Image-Generation Pipeline at Scale

Shoppers return time and again to Minted’s global online community of independent artists and designers because they know they can count on unique, statement-making products of the highest quality there. Concurrently, the visual imagery on must do justice to the designs into which the creators have poured their hearts and souls. For Minted’s VP of Engineering David Lien, “Because we are a premium brand, we need to ensure that every single one of our product images matches the selected configuration exactly. For example, if you pick an 18x24 art print on blue canvas, we will show that exact combination on the hero images in the PDF.”

Read more
Highlights on ImageCon 2021 and a Preview of ImageCon 2022

New year, same trend! Visual media will continue to play a monumental role in driving online conversions. To keep up with visual-experience trends and best practices, Cloudinary holds an annual conference called ImageCon, a one-of-a-kind event that helps attendees create the most engaging visual experiences possible.

Read more