Programmable Media

Cloudinary AI Content Analysis

Last updated: Apr-18-2024

Cloudinary is a cloud-based service that provides solutions for image and video management. These include server or client-side upload, on-the-fly image and video transformations, fast CDN delivery, and a variety of asset management options.

The Cloudinary AI Content Analysis add-on (formerly known as the Cloudinary Object-Aware Cropping add-on) uses AI-based object detection and content-aware algorithms to provide the following functionality:

Object-aware cropping: Ensures that your image crops keep the specific objects that matter to you, even when you significantly modify the aspect ratio.

Original

Crop to the sink

Automatic image tagging: Adds tags to your images based on objects or abstract concepts detected by the content-aware detection models specified on upload, or when invoked on images already stored in your product environment.

AI-based image captioning: Analyzes an image and suggests a caption to use appropriate to the image's contents.

A group of young children playing soccer on a soccer field with a goal post in the foreground and a goal post in the background

On this page:

Getting started
Supported content-aware detection models
Object detection demo
Object-aware cropping
Automatic image tagging
AI-based image captioning

Tip

This page describes how to use the Cloudinary AI Content Analysis add-on programmatically in Programmable Media, but that you can also use the add-on for DAM use-cases in Assets. For more information, see Cloudinary AI Content Analysis in the Assets user guide.

Getting started

Before you can use the Cloudinary AI Content Analysis add-on:

You must have a Cloudinary account. If you don't already have one, you can sign up for a free account.
Register for the add-on: make sure you're logged in to your account and then go to the Add-ons page. For more information about add-on registrations, see Registering for add-ons.
Keep in mind that many of the examples on this page use our SDKs. For SDK installation and configuration details, see the relevant SDK guide.
If you are new to Cloudinary, you may want to take a look at How to integrate Cloudinary in your app for a walk through on the basics of creating and setting up your account, working with SDKs, and then uploading, transforming and delivering assets.

Important

By default, delivery URLs that use this add-on either need to be signed or eagerly generated. You can optionally remove this requirement by selecting this add-on in the Allow unsigned add-on transformations section of the Security page in the Console Settings. (Cloudinary's demo product environment has this setting applied to make the examples on this page easier to read and try out.)

Supported content-aware detection models

The Cloudinary AI Content Analysis add-on supports a number of built-in content-aware detection models, each supporting a specific set of categories and objects. You can specify which version of each model to invoke for each use of the add-on.

Cloudinary currently supports the following models:

Model	Description
coco	The Common Objects in Context model contains just 80 common objects.
cld-fashion	Cloudinary's fashion model is specifically dedicated to items of clothing. Used with automatic image tagging, the response includes attributes of the clothing identified, for example whether the garment contains pockets, its material and the fastenings used.
lvis	The Large Vocabulary Instance Segmentation model contains thousands of general objects.
unidet	The UniDet model is a unified model, combining a number of object models, including Objects365, which focuses on diverse objects in the wild.
openimages	Google's Open Images Dataset model contains 600 general objects.
human-anatomy	Cloudinary's human anatomy model identifies parts of the human body in an image. It works best when the majority of a human body is detected in the image.
cld-text	Cloudinary's text model tells you if your image includes text, and where it's located. Used with automatic image tagging, you can then search for images that contain blocks of text. Used with object-aware cropping, you can choose to keep only the text part, or specify a crop that avoids the text.
shop-classifier	Cloudinary's shop classifier model detects if the image is a product image taken in a studio, or if it's a natural image.
image-type	Cloudinary's image type model detects generic properties about a photographic image, for example, photographic style, setting and time of the photo.

Model capabilities

This table shows the capabilities of each supported version of each model:

Default version is the version of the model that is invoked if left unspecified.
Version indicates support for a particular version of the model - different versions have different accuracies.
Default confidence shows the confidence level used when auto_tagging is set to default.
Tag indicates support for returning tags. This is a required capability for automatic image tagging.
Confidence indicates support for returning confidence levels.
Bounding Box indicates support for returning bounding boxes. This is a required capability for object-aware cropping.
Attributes indicates support for returning attributes for each tag in a (key,value) list.

Notes

If you are using our Asia Pacific data center, currently you can use only the COCO and Open Images models.
If you have difficulty accessing any of the models, please contact support.

Supported objects and categories

Start typing the name of an object or category to see if it's supported by one of the built-in models.

For object-aware cropping:
- The Full URL Syntax column shows the syntax to use to detect a specific object or category in a particular version of a model (e.g. coco_v2_tie). You can also omit the version (e.g. coco_tie), or both the model and version (e.g. tie).
For automatic image tagging:
- You can specify the model and version (e.g. coco_v2), or only the model (e.g. coco).
For video tracking layers:
- Specify the object from the cld-fashion model (e.g. g_track_person:obj_hat)

Private models

If you have your own content-aware detection models that you would like to use, these can be integrated as private models that work only on your product environment. This service is provided for customers on Enterprise plans through Professional Services. Contact our Enterprise support and sales team or your CSM to find out more.

Object detection demo

This demo lets you choose one of the content-aware detection models, and shows up to twenty objects that are detected by that model in an image of your choice.

Tip

To see a full list of all the detected objects and other information returned by the model, expand the JSON that appears under the image after upload.

Automatic image tagging is requested on upload, and the response provides the necessary information to overlay bounding boxes around the detected objects, together with the confidence level.

1 Select a model:

2 Upload new image or Use current image

3 See the detected objects:

Click the image to open it full size in a new tab.

Learn more

Read this blog to discover all the Cloudinary features in this demo.

Object-aware cropping

When object-aware cropping is invoked, Cloudinary applies advanced AI-based object detection algorithms on the fly during the crop process. You can either use it in conjunction with auto-gravity to give higher priority to the objects you care about, or directly specify that the crop should be exactly based on the detected coordinates of the specified objects.

Watch this demo to see how the same image is cropped according to the parameters specified in the URL:

Applying object-aware cropping

After registering for the Cloudinary AI Content Analysis add-on, you can apply it in one of two ways:

Automatic gravity with a high weighting towards a specified object
This variant of auto-gravity cropping enables you to indicate specific objects or object categories that should be given priority when parts of a photo are cropped out. This is done by specifying an object or an object category as the focal_gravity attribute for the auto gravity parameter (for example, g_auto:cat in URLs) together with a cropping option. If the specified content is not found in the image, the gravity is determined by the standard auto-gravity algorithm.
Object-specific gravity
By specifying an object or object category as the gravity parameter (for example, g_cat in URLs) together with a cropping option, you can accurately crop around objects without needing to specify dimensions or aspect ratio. If the specified content is not found in the image, the gravity remains at the center of the image.

When specifying an object or category, you can optionally include a specific model (that supports bounding boxes) and version. For example, you can specify:

Only the object/category, e.g.: g_auto:cat or g_cat
The model with the object/category, e.g.: g_auto:coco_cat or g_coco_cat
The model and version with object/category, e.g.: g_auto:coco_v2_cat or g_coco_v2_cat

If you choose not to specify a model, each model that supports bounding boxes is invoked in turn until the specified content is detected. The order in which they are invoked is: coco > cld-fashion > lvis > unidet > openimages > human-anatomy > cld-text.

Note

If you have any private models set up, these are invoked first, in the order that was predefined for your product environment.

Consider the original image of a kitchen below:

Using auto-gravity, you can deliver a square thumbnail crop that prioritizes the detected coordinates of the sink, microwave, or refrigerator. To do this, specify the relevant object option for the g_auto gravity definition in conjunction with the thumb or auto cropping option:

g_auto:sink

g_auto:microwave

g_auto:refrigerator

Using object-specific gravity, you can choose not to give dimensions or aspect ratio, and deliver an image that is tightly cropped to the object. To do this, specify the relevant object option for the gravity definition in conjunction with the crop cropping option:

g_sink

g_microwave

g_refrigerator

You can also specify an aspect ratio together with the crop cropping option, without including specific dimensions. This keeps the object but may show more of the image to fit the aspect ratio.

g_sink

g_microwave

g_refrigerator

In addition to the crop, thumb and auto cropping modes, object aware cropping can also be used with the fill and lfill (limit fill) cropping modes. The fill_pad and auto_pad cropping modes work with the auto-gravity variant of object aware cropping, but not object-specific gravity.

Notes on specifying categories and objects

When applying object-aware cropping, you can specify either individual objects or more general object categories.

When you specify a category, the algorithm gives priority to any objects that are detected from that category.
The regular auto-gravity behavior also impacts the cropping decision. But if requested objects are detected, they get significantly higher priority than the subjects or salient areas that the regular auto-gravity algorithm selects.
If you specify the generic object category with auto-gravity (g_auto:object), then any detected objects from any category get priority.
If there are multiple objects of the same type in the image, object-specific gravity selects the most prominent of the objects, and bases its crop around only that object, whereas auto-gravity may choose to keep more than one of the objects in the crop.
The categories and objects also work in their plural forms when using object-specific gravity. So, for example, c_crop,g_birds keeps all birds in the crop, whereas c_crop,g_bird keeps only the most prominent bird.

Combining focal gravity options using auto-gravity

When using auto gravity to determine the area to keep in a crop, you can specify multiple focal_gravity options.

This means that in a single auto-gravity parameter, you can optionally specify:

One or multiple objects (from the same or different categories and/or models)
Built-in focal gravity options such as face/faces or custom_no_override
Other add-on based focal gravity options, such as the adv_face, adv_eyes options from the Advanced Facial Attributes Detection add-on
Only the classic or only the subject auto-gravity algorithm, which in some cases may have some impact on the exact coordinates of the crop, even if other specified objects or focal gravity options are detected. Note that the default algorithm, which combines both of these algorithms, is recommended in the majority of cases.

For example, your auto-gravity URL parameter might be: g_auto:cat:sofa:faces:adv_eyes

This would instruct the cropping mechanism to give top priority to any cats, sofas, faces, or eyes detected in the photo.

For a complete list of all focal_gravity options, see the g_<special_position> section of the Transformation URL API Reference.

Important

The focal gravity options can be specified in any order. The order does not impact the result.
When multiple items are detected that match the requested focal options, larger, more central, and more in-focus (less blurry) objects will get higher priority.
In special cases, it's possible to fine-tune this default prioritization further. For details, contact support.
If a particular image has custom coordinates defined, those coordinates always override all other focal gravity options, unless you use the custom_no_override option in conjunction with the other options.

Combining focal gravity options using object-specific gravity

When using object-specific gravity to determine the area to keep in a crop, you can specify multiple focal_gravity options, but unlike auto-gravity, the order in which they are specified has an impact on the delivered image.

For example, consider this photo of a cat and dog:

By setting the gravity parameter to cat:dog the cat gets precedence:

Whereas, if you switch the order to dog:cat the dog gets precedence:

You can also combine the auto option to invoke the auto-gravity algorithm if none of the specified objects are found. For example:

g_dog:cat:auto - auto-gravity is invoked only if no dogs and cats are detected.
g_dog:auto:cat - auto-gravity weighted by cat (g_auto:cat) is invoked if no dogs are detected.

Important

If you use the auto option then you also need to specify at least one dimension parameter (width or height).

For example, consider this photo of a cat and three birds:

As there is no dog in the photo, auto-gravity weighted by bird is invoked when using dog:auto:bird. In this case, two birds are kept in the crop:

Notice that if auto-gravity is not specified, the object-specific algorithm chooses the most prominent bird out of the three and only keeps this bird in the crop:

Specifying objects to avoid using auto-gravity

In addition to specifying objects to keep in an image, you can specify objects that you would rather not see. To minimize the likelihood of including a particular object in the cropped image, use auto-gravity with the avoid option for the relevant object or category.

For example, in photos like the one below, you may prefer not to include people because the purpose of the photo is to show an interesting store front, and the people are a distraction.

Using g_auto by itself makes the people the focal point, but if we use g_auto:person_avoid, the other side of the photo is shown, without the people.

g_auto

g_auto:person_avoid

Choosing the cropping mode

When you specify an object, either specifically or in your auto-gravity parameter, the Object-Aware Cropping AI algorithm detects the coordinates of the object and those coordinates are used by the cropping mode.

When using thumb cropping (c_thumb), the image is cropped as closely as possible to the detected coordinates of the object given the requested aspect ratio, and then scaled to the requested pixel size. Note that if the requested pixel size is greater than the crop, the image is not scaled up, but filled with further pixels from the image.
When using crop mode (c_crop), the detected coordinates are prioritized as the area to keep when determining how much to cut from each edge of the photo in order to achieve the requested pixel size. If using auto-gravity and the requested pixel size is larger than the coordinates of the detected object, other elements of the image that receive priority from g_auto may impact what else is included in the photo and where in your resulting image the detected object may be located, meaning that the detected object will not necessarily be the center of the photo.
When using any of the fill-based modes (c_fill, c_lfill, c_fill_pad), the coordinates of the detected object should be retained if any cropping is required after scaling. If using auto-gravity, other elements of the image that receive priority from g_auto may impact what else is included in the photo and where in your resulting image the detected object may be located, meaning that the detected object will not necessarily be the center of the photo.
When using the auto cropping mode (c_auto), the crop is focused on the object, but also takes into account more of the whole picture, so gives a more 'zoomed out' result than thumb, and crop but more 'zoomed in' than fill. If the requested dimensions are smaller than the best crop, the result is downscaled. If the requested dimensions are larger than the original image, the result is upscaled.

The following examples show how different your cropping results may be for the same requested object in the gravity, but with different cropping modes. In this case, we take the original photo below and apply g_auto:camera and g_camera with fill, crop, thumb and auto cropping modes. In all cases, the same width and aspect ratio are requested (ar_1,w_200).

Original

g_auto:camera

c_fill

c_crop

c_thumb

c_auto

g_camera

c_fill

c_crop

c_thumb

c_auto

Using object-aware cropping for responsive delivery

You can take advantage of object-aware cropping with various cropping modes to assist in responsive art direction. This means that when you deliver different sized images to different devices, you don't just scale the same image, but rather crop images differently for different sizes, so that the important objects are always highly visible.

For example, you may:

deliver a full-size image to large HD screens
use g_auto:[your_important_object], or g_[your_important_object] with fill cropping for medium sized screens
use g_auto:[your_important_object], or g_[your_important_object] with thumb or auto cropping for very small screens.

For more details on delivering responsive images, see the Responsive images guide.

Using objects with the zoompan effect

In addition to cropping, the Cloudinary AI Content Analysis add-on allows you to use objects for start and end points of a zoompan transformation.

The zoompan effect lets you create a video or animated GIF from an image by zooming and panning from one area of the image to another. Use the from and/or to options with objects as gravity and specify a video or animated image format.

The example below is a seven second MP4 video (.mp4) of a model wearing fashionable items, starting zoomed into the hat (from_(g_hat;zoom_4.5)), then zooming out and panning to the pants (to_(g_pants;zoom_1.6)).

Signed URLs

Cloudinary's dynamic image transformation URLs are powerful tools. However, due to the potential costs of your customers experimenting with dynamic URLs that apply the object-aware cropping algorithm, image transformation add-on URLs are required (by default) to be signed using Cloudinary's authenticated API. Alternatively, you can eagerly generate the requested derived images using Cloudinary's authenticated API.

To create a signed delivery URL, set the sign_url parameter to true when building a URL or creating an image tag.

The following code example applies object-aware cropping to the skater image, including a signed Cloudinary URL:

The generated Cloudinary URL shown below includes a signature component (/s--acvfjq2y--/). Only URLs with a valid signature that matches the requested image transformation will be approved for on-the-fly image transformation and delivery.

For more details on signed URLs, see Signed delivery URLs.

Note

You can optionally remove the signed URL default requirement for a particular add-on by selecting that add-on in the Allow unsigned add-on transformations section of the Security page in the Cloudinary Console Settings.

Automatic image tagging

The automatic image tagging behavior of the Cloudinary AI Content Analysis add-on can be invoked on uploading an image, or by updating an image that's already stored in your product environment. Using the specified model, it analyzes the image, identifies categories and objects, and suggests tags that could be applied to the image.

Object and category detection

Take a look at the following photo of a woman dressed fashionably for winter:

By setting the detection parameter to the name of the model (and optionally the version, e.g. cld-fashion_v3) you want to invoke when calling Cloudinary's upload or update methods, the add-on automatically analyzes the content of the uploaded or specified existing image. For example, invoking the cld-fashion detection model while uploading winter_fashion.jpg:

Tip

You can use upload presets to centrally define a set of upload options including add-on operations to apply, instead of specifying them in each upload call. You can define multiple upload presets, and apply different presets in different upload scenarios. You can create new upload presets in the Upload page of the Console Settings or using the upload_presets Admin API method. From the Upload page of the Console Settings, you can also select default upload presets to use for image, video, and raw API uploads (respectively) as well as default presets for image, video, and raw uploads performed via the Media Library UI.

Learn more: Upload presets

The upload API response includes the categories and objects automatically identified by the model you requested. As can be seen in the response snippet below, a hat and a specific type of outerwear are automatically detected in the uploaded photo. Depending on the capabilities of each model, different information is returned. In the example below, a confidence score, bounding box and in some cases, attributes, are returned for each detected object. The confidence score is a numerical value representing the certainty of a correct detection, where 1.0 means 100% confidence. The bounding-box parameter shows the location of the object in the image, as an array: [x-coordinate of top left corner, y-coordinate of top left corner, width of box, height of box]. Bounding-box information is used in the object detection demo.

Adding tags to images

By providing the auto_tagging parameter to an upload or update request, images are automatically assigned tags based on the detected content. The value of the auto_tagging parameter is the minimum confidence score of a detected category or object that should be automatically used as an assigned tag. You can also set auto_tagging to default, which uses the model's default confidence.

The following code example automatically tags an uploaded image with all detected categories that have a confidence score higher than 0.6.

The response to the upload request returns the detected categories as well as the assigned tags for categories meeting the minimum confidence score of 0.6:

You can also use the update method to apply auto tagging to images already stored in your product environment.

The following example uses Cloudinary's update method on the puppy image in the product environment, to detect objects and categories in the LVIS model. Tags are automatically assigned based on the objects and categories detected with over a 90% confidence level.

You can use the Admin API's resource_by_tag method to return all resources with a certain tag, for example hat:

You can also use the search method or the Media Explorer advanced search to find images with certain tags.

Asynchronous handling

As automatic image tagging may not be immediate, it is good practice to use asynchronous handling for these calls.

To make the call asynchronous, set the async parameter of the upload method to true. To be notified when the processing is complete, you can either set the notification_url parameter of the upload method (as in the example below) or the global webhook Notification URL in the Upload page of your Cloudinary Console Settings.

The response to an asynchronous upload call looks similar to this:

When the processing is finished, the complete upload response is sent to the notification URL that you specified.

AI-based image captioning

Important

AI-based image captioning is currently in Beta. There may be minor changes in functionality before the general access release. We would appreciate any feedback via our support team.

The Cloudinary AI Content Analysis add-on can be used to analyze an image and suggest a caption based on the image's contents.

Some example captions suggested by the AI:

a brown dog standing on top of a street next to a sidewalk with a building in the back ground

a hand reaching for a donut with chocolate and sprinkles on it on a dark surface

A brown dog standing on top of a street next to a sidewalk with a building in the background
A group of young children playing soccer on a soccer field with a goal post in the foreground and a goal post in the background
A hand reaching for a donut with chocolate and sprinkles on it on a dark surface

By setting the detection parameter to captioning when calling Cloudinary's upload or update methods, the add-on automatically analyzes the content of the image. For example, invoking the captioning detection model while uploading toy_room.jpg:

Tip

Learn more: Upload presets

The upload API response includes the captioning information:

Tips

You can retrieve the caption text value from the response and then use the update method of the Admin API to add the caption text to the metadata of images stored in your product environment, such as the contextual metadata (context) or a structured metadata field (metadata).
After you've requested a caption using the upload or update method, you can use the Admin API get details of a single resource method to return details of the image, including the stored caption value.

Asynchronous handling

As the response may not be immediate, it is good practice to use asynchronous handling for these calls.

The response to an asynchronous upload call looks similar to this:

When the processing is finished, the complete upload response is sent to the notification URL that you specified.

✔️ Feedback sent!

✖️

Error

Unfortunately there's been an error sending your feedback.

Rate this page: