Image classification is at the core of many popular products and features - from Facebook's photo-tagging to Tesla's self-driving car. This article gives an introduction to image classifiers are and why they matter.
The different ways of processing an image
There are different ways to process images and image classification is one of the three main ones:
- Image classification: What is the type of the image? Outputs a class, e.g. "dog" or "cat" – more to that later.
- Object detection: Where is the object? The model outputs coordinates of a so-called "bounding box" around the object to be found in the image. It is used if you already know that you are looking for dogs and want to know if the specific image depicts a dog and, if yes, where it is. Common use cases are the cropping of portraits or capturing important information in images.
- Image segmentation: What is the shape of the object? The model creates a pixel-wise mask for each object in the image. Image segmentation gives more detailed information on size and shape. Although the models tend to be more computationally expensive, they are often used to improve the efficiency of the system as a whole: Algorithms can continue processing only the relevant parts of an image. As an example, a Face ID of smart phones must only analyze lines and shapes within the shape of the face, not the background.
What is image classification?
Imagine the classical example: You are given a set of images each of which either depicts a cat or a dog. Instead of labeling the pictures all on your own, you want to use an algorithm to do the work for you: It "looks" at the whole picture and outputs probabilities for each of the classes it was trained on.
This is usually made possible through training neural networks, which we describe in more detail in other articles. (Note: There are other techniques but they do not play a role in practice due to performance.) As in other applications of supervised learning, the network is fed with a sufficient training data – namely labeled images of cats and dogs.
What happens in between the image and output is somewhat obscure and we are going into greater detail in other posts. But in simple terms, most networks break down the image into abstract shapes and colors, which are used to form a hypothesis regarding the image's content.
In case you want to dig deeper, we are linking a few resources at the end of this article.
Applications for image classification
People (including us) keep using the example of labeling pictures of dogs and cats but there is significantly more potential in this technology than that. They lay the foundation for many of the great things computer vision is capable of and we are going through a few of them.
#1 Visual search engines
Search engines have become an integral part of most people's life. We type in keywords and get meaningful results, customized to what we were looking for.
Thanks to image classifiers, the same works for visual search as well.
Among the most popular visual search engines are big players such as Google and Bing. But there are some specialized players, too, such as TinEye and Picsearch.
In the stock photography business, visual search engines bring together photo contributors and photo buyers by making visual content discoverable via keyword search. On top of that, automatic keyword suggestions allow contributors to add a high number of accurate keywords in little time.
A visual search engine serves several use cases:
- Find images based on keywords: The classical Google image search. A user types in keywords and gets corresponding images as output.
- Get information on a particular image: A user inputs an image and gets information (text and visual) on that image. Input, for instance, the image of an unknown building. The search engine then provides you with information on the name and location of that particular building.
- Find similar images: A user inputs an image and gets similar images. Imagine, for instance, having a picture of your favorite actor. Feeding a visual search engine with this picture, you get tons of more pictures of your favorite actor - in all sorts of settings and locations.
In the visualization below you can see the results of the google search "brown puppy":
The visual search engine (in this case Google) provides pictures that match the search request (find images based on keywords). The result page shows pictures that an image classifier has tagged with the classes "brown" and "puppy".
But it doesn't stop here. When clicking on the top left picture, the algorithm shows additional pictures on the right (find similar images). Behind the scenes, an image classifier has scanned all images in its database, e.g. for color, shape, size. The algorithms then calculated how much they match with the clicked-on picture from the top left.
As you can see, the dogs in the resulting images look very similar: they have the same dark brown fur color and their faces have a similar shape and size.
#2 Logo detection: Enabling brands to do "visual listening"
Consumer brands need to know what is happening on social media as it provides valuable hints on customer behavior. Some example questions are: Has the last marketing campaign increased brand mentions? How do people interact with the brand? Who are the people that post about the brand? What do they write about it and why?
To keep track of that, brands monitor text posts for brand mentions. The sportswear manufacturer adidas, for example, would track any post including the word "adidas". This marketing method is called social listening.
The problem, however, is evident: Most of the time, people post about a brand without mentioning the brand name. They may wear the latest Adidas running shoes and make a statement about it but don't say so explicitly in the image caption. Social listening can't keep track of that – but visual listening can.
Image classifications enable brands to listen to visual content as well. By scanning through images, image classifiers can detect visual brand mentions. In analogy to social listening, this process is called visual listening.
By analyzing both visual and text data, brands can now more accurately conduct social media analytics. For instance, by analyzing the geographic and demographic metadata of posts they can estimate their market share within different customer segments.
Visual listening also allows brands to calculate the success of difficult to quantify marketing campaigns, such as sponsoring sports events. Take again adidas who is, for instance, sponsor of the FIFA World Cup. Through visual listening, they can better estimate their increase in brand awareness through this specific campaign.
Visual listening even contributes to brand protection. The technology can, for instance, detect fraudulent use of logos.
#3 Facial recognition: Replacing the boarding pass at airports
We already touched upon Apple's Face ID above but there are more applications which already enhance our daily lives, notably at modern airports.
At airports, facial recognition has the potential to substitute the boarding pass. The technology recognizes passenger's faces and matches them with several databases to verify their identity and flight data.
Besides an enhanced traveler experience, the time-saving potential is huge. Lufthansa, for instance, conducted a test phase at several US airports in 2018. According to the firm, it took just 22 minutes to board an Airbus A380 with 350 passengers, cutting the regular 40 minutes almost by half.
In addition to faster processing, facial recognition technology helps to improve security. According to the U.S. Customs and Border Protection (CBP), facial recognition can compete for security checks with "greater consistency and accuracy" as compared to in-person checks points.
With this huge time saving and security potential, it seems to be just a matter of time until biometric boarding becomes a standard.
What we have just discussed:
- Image classification belongs to the field of computer vision and describes the process of labeling an image according to its visual content.
- Object detection is used to identify the location of an object; the algorithms output a bounding rectangular box around the corresponding object.
- Image segmentation provides more detailed information on shapes through a pixel-wise mask for each object in the image.
- Image classification is applied in a wide range of industries and functionalities. Some of the countless use cases: Image classifiers automate and improve image tagging, enable brands to do "visual listening", replace boarding passes at airports, and are the backbone of visual search engines.
Now that you have understood what image classifiers are, you are all set to learn how they work. If you want to dive deeper into it, have a look at the second part of the article where we also linked a bunch of additional resources on this subject.