Discover in images how computers see the world

As part of the personal experiments I run to understand what works or not before bringing innovation to the company I work for, I explored artificial intelligence. At least as it is available in 2017 for a non-specialist developer as me.

I created a website to view images of multiple elements, places and locations, with the interpretations made by the AI algorithms.

AI algorithms can be surprisingly accurate and detailed in some situation, but I quickly found a real lack of consistency.  After viewing a batch of images with the descriptions generated by the algorithms, I could see some patterns and could almost predict some errors. For example, a train will be mentioned as soon as some smoke and an horizontal shape is visible in the picture. Or a clock tower as long as something round is visible on a building.

Move the mouse over the images to view the descriptions generated by the algorithms.

All misinterpretations can be viewed here:

How it works

The Microsoft cognitive services includes a visual API that returns a text description and a list of keyword from an image.

The algorithm compares the image with multiple images in a database:

  • Images in the database are associated with keywords
  • The image to analyse is compared to the images in the databases and a list of keywords from the compared images that are similar are ranked based on their occurrences.
  • Based on the score (or confidence value) of the keywords a sentence is built in natural language.

To get a better understanding, you can browse the 2017 section of to see multiple images with the description, the keywords and the confidence value for each one of them.

Would you let those algorithms drive your car?

Overall, the descriptions may looks good, if not impressive for some images. However, when driving a car, a single error or a misinterpretation can result in a serious crash. Also, the algorithms have to deal with moving parts, and not with static images.

As soon as few parameters are introduced (light and shadows, unusual point of view or unusual shapes) the algorithms become unpredictable.

If Microsoft Visual APIs is really almost the top of the art (and I think it is), then personally, I will not let those algorithms drive my car before I see progress.

Every year the same images will be processed by the latest algorithms, and we will be able to compare and to see how fast things are moving.


Almost correct interpretations:

Correct interpretations:

Leave a Reply