Cloudinary Blog

How to use OCR Text Recognition to automatically transform images

By Apr 19, 2017

image-transformation

How to use OCR Text Recognition to automatically transform images

Websites of all kinds enhance user experience with images. In fact, images appear on almost every Web page. Some of the images are uploaded by users, some are proprietary, and some come from 3rd parties. Regardless of origin, many of these images include text elements, and sometimes you need to be aware of or handle that text.

For example, you might need to:

Blur or pixelate texts that you don't want displayed on your website
Cover text in an uploaded image with another image
Have an automatic way to extract the text content so you can programmatically analyze it or perform operations based on the detected text. For example, you might want to make sure uploaded images do not contain too much text, or maybe you want to tag your images based on keywords detected in them.

These are common needs, but it's a hassle to do these things manually, even for your own proprietary images, and not an option for images that are uploaded by your users for immediate display.

The good news: This article will show you how you can handle all these and other text detection scenarios, on-the-fly, with only one or a few lines of code. Here are a couple examples:

An outline overlay shows
automatically detected text

The pixelate effect hides
detected text

Here's the code that builds the delivery URL for the right-hand (pixelated) image above:

URL Ruby PHP v1 PHP v2 Python Node.js Java JS jQuery React Vue.js Angular .NET Android iOS All

URL:

https://res.cloudinary.com/demo/image/upload/c_crop,w_1300,h_900,g_south_east/e_pixelate_region,g_ocr_text/highway_sign.jpg

Ruby:

cl_image_tag("highway_sign.jpg", :transformation=>[
  {:width=>1300, :height=>900, :gravity=>"south_east", :crop=>"crop"},
  {:effect=>"pixelate_region", :gravity=>"ocr_text"}
  ])

PHP v1:

cl_image_tag("highway_sign.jpg", array("transformation"=>array(
  array("width"=>1300, "height"=>900, "gravity"=>"south_east", "crop"=>"crop"),
  array("effect"=>"pixelate_region", "gravity"=>"ocr_text")
  )))

PHP v2:

(new ImageTag('highway_sign.jpg'))
  ->resize(Resize::crop()->width(1300)->height(900)->gravity(Gravity::compass(Compass::southEast())))
  ->effect(Effect::pixelate()->region(Region::ocr()));

Python:

CloudinaryImage("highway_sign.jpg").image(transformation=[
  {'width': 1300, 'height': 900, 'gravity': "south_east", 'crop': "crop"},
  {'effect': "pixelate_region", 'gravity': "ocr_text"}
  ])

Node.js:

cloudinary.image("highway_sign.jpg", {transformation: [
  {width: 1300, height: 900, gravity: "south_east", crop: "crop"},
  {effect: "pixelate_region", gravity: "ocr_text"}
  ]})

Java:

cloudinary.url().transformation(new Transformation()
  .width(1300).height(900).gravity("south_east").crop("crop").chain()
  .effect("pixelate_region").gravity("ocr_text")).imageTag("highway_sign.jpg");

JS:

cloudinary.imageTag('highway_sign.jpg', {transformation: [
  {width: 1300, height: 900, gravity: "south_east", crop: "crop"},
  {effect: "pixelate_region", gravity: "ocr_text"}
  ]}).toHtml();

jQuery:

$.cloudinary.image("highway_sign.jpg", {transformation: [
  {width: 1300, height: 900, gravity: "south_east", crop: "crop"},
  {effect: "pixelate_region", gravity: "ocr_text"}
  ]})

React:

<Image publicId="highway_sign.jpg" >
  <Transformation width="1300" height="900" gravity="south_east" crop="crop" />
  <Transformation effect="pixelate_region" gravity="ocr_text" />
</Image>

Vue.js:

<cld-image publicId="highway_sign.jpg" >
  <cld-transformation width="1300" height="900" gravity="south_east" crop="crop" />
  <cld-transformation effect="pixelate_region" gravity="ocr_text" />
</cld-image>

Angular:

<cl-image public-id="highway_sign.jpg" >
  <cl-transformation width="1300" height="900" gravity="south_east" crop="crop">
  </cl-transformation>
  <cl-transformation effect="pixelate_region" gravity="ocr_text">
  </cl-transformation>
</cl-image>

.NET:

cloudinary.Api.UrlImgUp.Transform(new Transformation()
  .Width(1300).Height(900).Gravity("south_east").Crop("crop").Chain()
  .Effect("pixelate_region").Gravity("ocr_text")).BuildImageTag("highway_sign.jpg")

Android:

MediaManager.get().url().transformation(new Transformation()
  .width(1300).height(900).gravity("south_east").crop("crop").chain()
  .effect("pixelate_region").gravity("ocr_text")).generate("highway_sign.jpg");

iOS:

imageView.cldSetImage(cloudinary.createUrl().setTransformation(CLDTransformation()
  .setWidth(1300).setHeight(900).setGravity("south_east").setCrop("crop").chain()
  .setEffect("pixelate_region").setGravity("ocr_text")).generate("highway_sign.jpg")!, cloudinary: cloudinary)

Stay tuned to learn how you can apply these same capabilities to your own site or app.

A winning combination: leading OCR technology + dynamic image transformation functionality

Extracting text from images programmatically is a technology that has existed at some level for many years and is usually referred to as OCR (Optical Character Recognition).

In recent years, advanced systems have been developed that are capable of producing a high degree of recognition accuracy for most fonts and languages. Although no system is 100% accurate, the better ones are getting close.

At Cloudinary, our mission is to offer a comprehensive solution for all elements of image and media management, enabling web and app developers to invest their full focus on the main purpose of their own site, or app. That’s why we decided to offer our new OCR Text Detection and Extraction add-on, which streamlines our extensive image transformation capabilities with one of the most advanced and precise OCR text extraction engines: Google’s Cloud Vision.

Some real use cases

Using image overlays to cover unwanted text

Suppose your website helps people find their next dream car. It’s a free service for the buyers of course. It’s also fair for the sellers who list all their cars for free, and pay a commission only if the car sells through your site. But some dealers forget to follow the website policy, and they list their direct phone number on the image. A problem? Not with the new OCR Text Recognition and Extraction add-on. Take a look at how easy it is to cover any embedded text in an uploaded image using a simple OCR transformation.

For example, the dynamic transformation URL (and corresponding SDK code) shown below performs OCR detection and adds an an image as an overlay on top of any detected text. Everything is done on-the-fly in the cloud by simply adding 2 parameters to the code that builds the URL:

Set the overlay parameter to the quikcar_logo image (l_quikcar_logo in the URL)
Set the gravity (location for the overlay) to ocr_text (g_ocr_text in the URL)

URL Ruby PHP v1 PHP v2 Python Node.js Java JS jQuery React Vue.js Angular .NET Android iOS All

URL:

https://res.cloudinary.com/demo/image/upload/l_quikcar_logo,fl_region_relative,w_1.4,g_ocr_text/jeepsale.jpg

Ruby:

cl_image_tag("jeepsale.jpg", :overlay=>"quikcar_logo", :flags=>"region_relative", :width=>1.4, :gravity=>"ocr_text")

PHP v1:

cl_image_tag("jeepsale.jpg", array("overlay"=>"quikcar_logo", "flags"=>"region_relative", "width"=>"1.4", "gravity"=>"ocr_text"))

PHP v2:

(new ImageTag('jeepsale.jpg'))
  ->overlay(
      Overlay::source(Source::image('quikcar_logo')
        ->transformation((new ImageTransformation())
          ->resize(Resize::scale()->width(1.4)->regionRelative())))
      ->position((new Position())
        ->gravity(Gravity::focusOn(FocusOn::ocr()))
  ));

Python:

CloudinaryImage("jeepsale.jpg").image(overlay="quikcar_logo", flags="region_relative", width="1.4", gravity="ocr_text")

Node.js:

cloudinary.image("jeepsale.jpg", {overlay: "quikcar_logo", flags: "region_relative", width: "1.4", gravity: "ocr_text"})

Java:

cloudinary.url().transformation(new Transformation().overlay(new Layer().publicId("quikcar_logo")).flags("region_relative").width(1.4).gravity("ocr_text")).imageTag("jeepsale.jpg");

JS:

cloudinary.imageTag('jeepsale.jpg', {overlay: new cloudinary.Layer().publicId("quikcar_logo"), flags: "region_relative", width: "1.4", gravity: "ocr_text"}).toHtml();

jQuery:

$.cloudinary.image("jeepsale.jpg", {overlay: new cloudinary.Layer().publicId("quikcar_logo"), flags: "region_relative", width: "1.4", gravity: "ocr_text"})

React:

<Image publicId="jeepsale.jpg" >
  <Transformation overlay="quikcar_logo" flags="region_relative" width="1.4" gravity="ocr_text" />
</Image>

Vue.js:

<cld-image publicId="jeepsale.jpg" >
  <cld-transformation :overlay="quikcar_logo" flags="region_relative" width="1.4" gravity="ocr_text" />
</cld-image>

Angular:

<cl-image public-id="jeepsale.jpg" >
  <cl-transformation overlay="quikcar_logo" flags="region_relative" width="1.4" gravity="ocr_text">
  </cl-transformation>
</cl-image>

.NET:

cloudinary.Api.UrlImgUp.Transform(new Transformation().Overlay(new Layer().PublicId("quikcar_logo")).Flags("region_relative").Width(1.4).Gravity("ocr_text")).BuildImageTag("jeepsale.jpg")

Android:

MediaManager.get().url().transformation(new Transformation().overlay(new Layer().publicId("quikcar_logo")).flags("region_relative").width(1.4).gravity("ocr_text")).generate("jeepsale.jpg");

iOS:

imageView.cldSetImage(cloudinary.createUrl().setTransformation(CLDTransformation().setOverlay("quikcar_logo").setFlags("region_relative").setWidth(1.4).setGravity("ocr_text")).generate("jeepsale.jpg")!, cloudinary: cloudinary)

Originally uploaded image

Image that is immediately displayed on your site with logo overlay

Photo Credit: Sawinery.net

Blurring out a brand name

You maintain a blog where you and other users post regularly. To enhance engagement, you make sure to embed lots of interesting images in every article. You don’t want anybody to think that your posts are commercially biased, but these days, (almost) everything is branded. Using Cloudinary’s OCR add-on, it again takes just one line of SDK code (or a manually built URL) with a few parameters to blur out that brand name.

In this case, we take advantage of the blur_region effect at its top blurring strength (2000), and again use that ocr_text gravity so that all detected text regions are blurred:

URL Ruby PHP v1 PHP v2 Python Node.js Java JS jQuery React Vue.js Angular .NET Android iOS All

URL:

https://res.cloudinary.com/demo/image/upload/e_blur_region:2000,g_ocr_text/w_500/piano.jpg

Ruby:

cl_image_tag("piano.jpg", :transformation=>[
  {:effect=>"blur_region:2000", :gravity=>"ocr_text"},
  {:width=>500, :crop=>"scale"}
  ])

PHP v1:

cl_image_tag("piano.jpg", array("transformation"=>array(
  array("effect"=>"blur_region:2000", "gravity"=>"ocr_text"),
  array("width"=>500, "crop"=>"scale")
  )))

PHP v2:

(new ImageTag('piano.jpg'))
  ->effect(Effect::blur()->strength(2000)->region(Region::ocr()))
  ->resize(Resize::scale()->width(500));

Python:

CloudinaryImage("piano.jpg").image(transformation=[
  {'effect': "blur_region:2000", 'gravity': "ocr_text"},
  {'width': 500, 'crop': "scale"}
  ])

Node.js:

cloudinary.image("piano.jpg", {transformation: [
  {effect: "blur_region:2000", gravity: "ocr_text"},
  {width: 500, crop: "scale"}
  ]})

Java:

cloudinary.url().transformation(new Transformation()
  .effect("blur_region:2000").gravity("ocr_text").chain()
  .width(500).crop("scale")).imageTag("piano.jpg");

JS:

cloudinary.imageTag('piano.jpg', {transformation: [
  {effect: "blur_region:2000", gravity: "ocr_text"},
  {width: 500, crop: "scale"}
  ]}).toHtml();

jQuery:

$.cloudinary.image("piano.jpg", {transformation: [
  {effect: "blur_region:2000", gravity: "ocr_text"},
  {width: 500, crop: "scale"}
  ]})

React:

<Image publicId="piano.jpg" >
  <Transformation effect="blur_region:2000" gravity="ocr_text" />
  <Transformation width="500" crop="scale" />
</Image>

Vue.js:

<cld-image publicId="piano.jpg" >
  <cld-transformation effect="blur_region:2000" gravity="ocr_text" />
  <cld-transformation width="500" crop="scale" />
</cld-image>

Angular:

<cl-image public-id="piano.jpg" >
  <cl-transformation effect="blur_region:2000" gravity="ocr_text">
  </cl-transformation>
  <cl-transformation width="500" crop="scale">
  </cl-transformation>
</cl-image>

.NET:

cloudinary.Api.UrlImgUp.Transform(new Transformation()
  .Effect("blur_region:2000").Gravity("ocr_text").Chain()
  .Width(500).Crop("scale")).BuildImageTag("piano.jpg")

Android:

MediaManager.get().url().transformation(new Transformation()
  .effect("blur_region:2000").gravity("ocr_text").chain()
  .width(500).crop("scale")).generate("piano.jpg");

iOS:

imageView.cldSetImage(cloudinary.createUrl().setTransformation(CLDTransformation()
  .setEffect("blur_region:2000").setGravity("ocr_text").chain()
  .setWidth(500).setCrop("scale")).generate("piano.jpg")!, cloudinary: cloudinary)

Original, unblurred image

Blurred brand name text

Advanced processing using extracted text

Say that your website is based on user generated content and your income is based on click-through rates. Your users are of-course also interested in maximizing views of their posts. It is a known and proven fact that images catch the eyes of users and increase engagement. But it's also known that images containing significant amounts of text are less engaging and may harm the overall experience. For example, Facebook limits the exposure of ads that are text-heavy.

Luckily, you can help your users to avoid uploading images with excessive text content by using the OCR add-on to analyze the percentage of an image that contains text.

When you include the ocr parameter in your upload command, the JSON response includes all of the detected text and the exact bounding boxes coordinates of each word or text element. Combining this data with some simple math, you can write some simple code to:

Allow images with less than 15% to be uploaded freely.
Provide a warning for images with 15%-30% text, recommending that they use a less text-heavy image, but still allow them to continue if they choose.
Reject images with more than 30% text.

Here's a look at an excerpt from an upload response showing the bounding box of an individual text element extracted from an image:

              {
                "boundingPoly": {
                  "vertices": [
                    {
                      "y": 22,
                      "x": 760
                    },
                    {
                      "y": 22,
                      "x": 1039
                    },
                    {
                      "y": 90,
                      "x": 1039
                    },
                    {
                      "y": 90,
                      "x": 760
                    }
                  ]
                },
                "description": "Imagine"
              },
              {
                "boundingPoly": {
                  …
                  …
                  …

And here's some simple sample code (using Ruby on Rails) that accomplishes the text percentage validation described above by calculating the space taken by each of the individual bounding boxes of all text detected in an image:

if result['info']['ocr']['adv_ocr']['status'] == 'complete'
  data = result['info']['ocr']['adv_ocr']['data']
  boxes = data.first["textAnnotations"][1..-1].map{|poly| poly["boundingPoly"]
       ["vertices"]}.map{|vertices| vertices.values_at(0,2)}
  areas = boxes.map{|box| (box.first["x"]-box.second["x"])
       .abs * (box.first["y"]-box.second["y"]).abs}
  total_areas = areas.sum
  coverage = total_areas.to_f / (result["width"] * result["height"]) * 100

puts case
  when coverage < 15
    "Only #{coverage.round(2)}% of your image contains text. 
        This is a valid image!"
  when coverage < 30
    "#{coverage.round(2)}% of your image contains text. For better engagement, 
        it is recommended to upload an image with less text."
  else
    "We're sorry. #{coverage.round(2)}% of your image contains text. 
        Please use another image."
  end
end

If a customer uploaded the first image below, the above code would return 12.54% and thus would be allowed to continue, the second image would return ~16%, and thus would receive a warning, but the third image would return nearly 35%, and would be (politely) rejected.

12.54% text

16.36% text

34.87% text

In a word

In this article, we've demonstrated a few ways you can use the OCR Text Recognition and Extraction add-on to automatically blur, pixelate, overlay, and extract text from your images.

Want to know more? For a deeper look at the add-on's abilities and additional use-case scenarios with sample code, have a look at the add-on documentation.

Ready to give it a try? If you aren't already a Cloudinary customer, you are welcome to sign up for a free account and try the add-on along with the rest of the Cloudinary features.

Have some great ideas for how to make use of the OCR Text Detection and Extraction add-on in your site or app? We’d be happy to hear what you think and appreciate any feedback.

Recent Blog Posts

Supercharge Shopify Conversions and Revenue with Cloudinary

By Aug 19, 2022

Our $2B Valuation

By Feb 17, 2022

When we started our journey in 2012, we were looking to improve our lives as developers by making it easier for us to handle the arduous tasks of handling images and videos in our code. That initial line of developer code has evolved into a full suite of media experience solutions driven by a mission that gradually revealed itself over the course of the past 10 years: help companies unleash the full potential of their media to create the most engaging visual experiences.

Compelling Visual Experiences Are Imperative for DTC E-Commerce

By Pradip Lal Feb 02, 2022

Direct-to-Consumer E-Commerce Requires Compelling Visual Experiences

When brands like you adopt a direct–to-consumer (DTC) e-commerce approach with no involvement of retailers or marketplaces, you gain direct and timely insight into evolving shopping behaviors. Accordingly, you can accommodate shoppers’ preferences by continually adjusting your product offering and interspersing the shopping journey with moments of excitement and intrigue. Opportunities abound for you to cultivate engaging customer relationships.

Automatically Translating Videos for International Outreach

By Jan 26, 2022

Automatically Translating Videos for an International Audience

No matter your business focus—public service, B2B integration, recruitment—multimedia, in particular video, is remarkably effective in communicating with the audience. Before, making video accessible to diverse viewers involved tasks galore, such as eliciting the service of production studios to manually dub, transcribe, and add subtitles. Those operations were costly and slow, especially for globally destined content.

Minted Delivers High-quality Art at Scale with Cloudinary

By Jan 25, 2022

Cloudinary Helps Minted Manage Its Image-Generation Pipeline at Scale

Shoppers return time and again to Minted’s global online community of independent artists and designers because they know they can count on unique, statement-making products of the highest quality there. Concurrently, the visual imagery on Minted.com must do justice to the designs into which the creators have poured their hearts and souls. For Minted’s VP of Engineering David Lien, “Because we are a premium brand, we need to ensure that every single one of our product images matches the selected configuration exactly. For example, if you pick an 18x24 art print on blue canvas, we will show that exact combination on the hero images in the PDF.”

ImageCon 2021 by the Numbers: Fun Facts and Event Highlights

By Jan 24, 2022

Highlights on ImageCon 2021 and a Preview of ImageCon 2022

New year, same trend! Visual media will continue to play a monumental role in driving online conversions. To keep up with visual-experience trends and best practices, Cloudinary holds an annual conference called ImageCon, a one-of-a-kind event that helps attendees create the most engaging visual experiences possible.