Cloudinary Blog

Automatically Translating Videos for International Outreach

By Jan 26, 2022

Automatically Translating Videos for an International Audience

No matter your business focus—public service, B2B integration, recruitment—multimedia, in particular video, is remarkably effective in communicating with the audience. Before, making video accessible to diverse viewers involved tasks galore, such as eliciting the service of production studios to manually dub, transcribe, and add subtitles. Those operations were costly and slow, especially for globally destined content.

It’s now a different story with cloud-based tools and services, which enable international outreach for enterprises and small businesses alike. Noteworthy are machine-learning tools that integrate the transcription process into the production cycle with a quick turnaround.

This article describes several capabilities of the Cloudinary add-on for Google’s AI Video Transcription and the Google Cloud Speech API, as follows:

Automate translation into multiple languages.
Generate subtitles and transcripts.
Adjust logos and apply overlays and special effects for specific regions.

Adding Subtitles and Transcripts

Because, with subtitles, viewers can comprehend and enjoy visual content without knowing the video’s language of recording, subtitles are one of the easiest ways to make videos accessible heterogeneously. On the other hand, transcripts are extremely helpful for those who are hard of hearing or who have limited vision, for those folks can make out the text with a screen reader.

Even though Google Cloud Speech API works without Cloudinary’s add-on, it requires a much more complex configuration. Cloudinary simplifies the process by automatically generating subtitles and transcripts from the video in your account’s Media Library, after which you can display them in the video player. Furthermore, you can translate in real time the text to many supported languages with Cloudinary’s Google Translation add-ons, which can also translate your video’s metadata—title, description, tags—to facilitate search.

Those translation add-ons, each slated for a specific use, generate transcripts along with the necessary data for rendering text on screen. This example, which transcribes a President Lincoln speech, demonstrates the add-ons in action: video tagging, image moderation and enhancement, etc.

Translating Embedded Content

Even with translated audio, videos could contain language-driven information in their frames, such as road signs, whiteboards, restaurant menus, and slide presentations. Depending on how viewers interact with your media, you might need to translate that embedded content. As an integral part of video, embedded content is just as important as the primary content.

With AI-based optical character recognition (OCR) tools like Cloudinary’s OCR Text Detection and Extraction add-on, you can perform two tasks:

Extract frames from videos and look for translation-required content.
Translate the content to another language or show it as a tag because most screen readers require text with ARIA tags. Plus, the extracted text can provide more context on the video.
For example, if your application offers a virtual environment, e.g., a restaurant, you can extract a dinner menu from the video frames and link it to the ingredients or the restaurant location. For a slide presentation, consider extracting important labels and translating them for global consumption. Whereas subtitles deliver the base-level text, translation of the embedded content yields a sense of inclusion in the entire production.

OCR tools, e.g., Google's Video Intelligence API, can even update the applicable frames in a video to display the words and other data, including metadata, in the required language.

In addition, through AI-powered models, you can convert translated text to human speech in the same pitch as that of the original speaker of a different language. Called AI dubbing or synthetic-film dubbing, this approach helps maintain quality while reaching a wider audience. Since you can acquire more computing power to translate multiple videos in parallel, AI dubbing not only saves time and cost, but also achieves more scalability.

Applying Overlays and Special Effects

Production studios often put overlays, such as text descriptions or annotations of the content in the frame, a watermark logo, or text-based special effects on top of the final cut of a video. Besides translating the content within the video frames with the techniques described above, you can also add localized overlays and watermark logos to media for each geographical region with the overlay options offered by Cloudinary’s SDK.

For example, to add a text overlay to the media content programmatically, first translate the content to the local language with Cloudinary’s add-ons, and then add the text to the video according to the production’s style and theme requirements. Such an approach saves you production-cut effort and eliminates postproduction processing.

Organizations with different brand images or names in various regions usually have different logos for the foreign markets. Through the same Cloudinary SDK, you can add logos and watermarks to media, generating localized content with the correct overlays and text.

Summing It Up

To reach an extensive, varied audience and engage them with the same video content, you must translate and internationalize the associated text. Transcripts, subtitles, translations, and regional overlays greatly enhance user experience while ensuring regional compliance. Thanks to automation, the process involved, no longer tedious and pricey as before, speeds up your production cycle. Plus, machine learning-powered models deliver expert-level results in translation, transcription, and localization at scale.

In partnership with Google, Cloudinary offers numerous capabilities that create a more accessible and scalable content library, helping you reach the broadest audience possible worldwide. For details, check out our documentation.

Recent Blog Posts

Supercharge Shopify Conversions and Revenue with Cloudinary

By Aug 19, 2022

Our $2B Valuation

By Feb 17, 2022

When we started our journey in 2012, we were looking to improve our lives as developers by making it easier for us to handle the arduous tasks of handling images and videos in our code. That initial line of developer code has evolved into a full suite of media experience solutions driven by a mission that gradually revealed itself over the course of the past 10 years: help companies unleash the full potential of their media to create the most engaging visual experiences.

Compelling Visual Experiences Are Imperative for DTC E-Commerce

By Pradip Lal Feb 02, 2022

Direct-to-Consumer E-Commerce Requires Compelling Visual Experiences

When brands like you adopt a direct–to-consumer (DTC) e-commerce approach with no involvement of retailers or marketplaces, you gain direct and timely insight into evolving shopping behaviors. Accordingly, you can accommodate shoppers’ preferences by continually adjusting your product offering and interspersing the shopping journey with moments of excitement and intrigue. Opportunities abound for you to cultivate engaging customer relationships.

Minted Delivers High-quality Art at Scale with Cloudinary

By Jan 25, 2022

Cloudinary Helps Minted Manage Its Image-Generation Pipeline at Scale

Shoppers return time and again to Minted’s global online community of independent artists and designers because they know they can count on unique, statement-making products of the highest quality there. Concurrently, the visual imagery on Minted.com must do justice to the designs into which the creators have poured their hearts and souls. For Minted’s VP of Engineering David Lien, “Because we are a premium brand, we need to ensure that every single one of our product images matches the selected configuration exactly. For example, if you pick an 18x24 art print on blue canvas, we will show that exact combination on the hero images in the PDF.”

ImageCon 2021 by the Numbers: Fun Facts and Event Highlights

By Jan 24, 2022

Highlights on ImageCon 2021 and a Preview of ImageCon 2022

New year, same trend! Visual media will continue to play a monumental role in driving online conversions. To keep up with visual-experience trends and best practices, Cloudinary holds an annual conference called ImageCon, a one-of-a-kind event that helps attendees create the most engaging visual experiences possible.

New for DAM: Media Library Extension for Chrome

By Sharon Yelenik Jan 19, 2022

A New Media Library Chrome Extension for Cloudinary DAM

With the introduction of the Media Library Extension, a Chrome-browser add-on that streamlines the access to, search for, and management of images and videos, Cloudinary offers yet another effective tool for its Digital Asset Management (DAM) solution. Let’s have a look at how most teams are currently working with media assets and how the new add-on not only boosts efficiency, but also renders the process a pleasure to work with.