A bit more Wine, Please: Germany

Image recognition (IR) systems often perform poorly once in the real world. In this post, I test four of the most popular IR systems on original real world images of food from around the world, this time from Germany.

Key takeaway

Overall, the systems’ performances were disappointing. On the positive side, Amazon Rekognition did correctly label “Alcohol”, which is great because of alcohol’s sensitive nature in some contexts. Other than that, the systems failed to detect many objects, failed to provide specific labels, and presented many irrelevant labels.

Correctly predicted images 0/2
Correctly detected items 1/15
Correct labels 1/38
Potentially harmful detections/labels
0
The above table includes only detections and labels of 80%+ confidence level, for lower confidence levels see the tables further below.

Insights

The object detection feature failed to provide specific descriptions of the objects across all four systems. The descriptions that were given remained surface level (e.g. “Food” instead of “Minestrone”, or “Bottle” instead of “Bottle of Wine”) and many objects simply remained undetected. In the case of Vision, the bounding boxes’ positions of the object detection feature were all scrambled and unintelligible (see picture below). It’s unclear why the bounding boxes showed up like this.

As was the case for the previous countries, the labeling feature performed slight better, but still unsatisfactory. Amazon Rekognition provided the label “Alcohol” and “Wine” which, given the sensitive nature of alcohol in certain contexts, works well. Unfortunately, the other three systems failed to label these.

Except for “Alcohol” and “Wine”, other labels remained very unspecific and uninformative. For instance, “Food”, “Plate”, “Dish”, “Kitchen Utensil” and such were common. None detected the olives, pesto, or Minestrone. In one case, the Minestrone was labeled as “Curry”, which could be considered a (cultural) misrepresentation.

My recommendation

As for the previous analyses, both the object detection and labeling feature need much improvement. For object detection, this means being able to detect the various items in the first place, and giving more specific description in the second place. The labeling feature of Amazon Rekognition correctly identified “Alcohol” and “Wine” – both sensitive items in certain contexts -, and the other three systems would perhaps do well to also implement the identification of “Alcohol”.

Results

Two images of one meal from Germany were available:

  • Meal 1: Minestrone with Pesto and Olives, and a glass of Wine (Dinner)
Object detection results*:
Ground Truth Microsoft Azure Google Vision Amazon Rekognition IBM Watson
Bowl of Minestrone Bowl (0.66) Scrambled* Undetected /
Bowl of Pesto Bowl (0.58) Scrambled* Undetected /
Spoon Undetected Scrambled* Undetected /
Glass of wine Undetected Scrambled* Undetected /
Cup of Olives Undetected Scrambled* Undetected /
Wine Bottle Bottle (0.56) Scrambled* Undetected /
Spoon Undetected Scrambled* Undetected /

*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant

Labeling results:
MICROSOFT AZURE GOOGLE VISION AMAZON REKOGNITION IBM WATSON
Food (0.98) Food (0.98) Dish (0.95) Olive Green Color (0.80)
Plate (0.89) Tableware (0.97) Meal (0.95) Orange Color (0.66)
Ingredient (0.90) Food (0.95) Food (0.65)
Dishware (0.89) Plant (0.73) Dish (0.65)
Recipe (0.88) Pottery (0.68) Nutrition (0.65)
Serveware (0.85) Alcohol (0.58) Tableware (0.60)
Kitchen Utensil (0.84) Beverage (0.58) Side Dish (0.59)
Cuisine (0.83) Drink (0.58) Curry (0.51)
Dish (0.82) Garnish (0.50)
Rectangle (0.80)
Vegetable (0.78)
Leaf Vegetable (0.78)
Soup (0.77)
Produce (0.74)
Mixture (0.74)
Curry (0.70)
Comfort Food (0.69)
Spoon (0.69)
Yellow Curry (0.68)
Cutlery (0.68)
Circle (0.68)
Garnish (0.68)
Condiment (0.67)
Plate (0.65)
Stew (0.65)

Object detection results:

Ground Truth Microsoft Azure Google Vision Amazon Rekognition IBM Watson
Bowl of Minestrone Undetected Food (0.78) Undetected /
Bowl of Minestrone Bowl (0.75) Food (0.68) Undetected /
Bowl of Pesto Kitchen Utensil (0.52) Food (0.55) Undetected /
Spoon Undetected Undetected Undetected /
Glass of wine Undetected Undetected Undetected /
Glass of wine Cup (0.58) Undetected Undetected /
Wine Bottle Bottle (0.84) Packaged Goods Beer /
Spoon Undetected Undetected Spoon /

Labeling results:

MICROSOFT AZURE GOOGLE VISION AMAZON REKOGNITION IBM WATSON
Table (0.99) Food (0.99) Dish (0.99) Nutrition (0.74)
Plate (0.99) Tableware (0.97) Meal (0.99) Food (0.74)
Food (0.99) Table (0.95) Food (0.99) Charcoal Color (0.62)
Indoor (0.98) Bottle (0.94) Spoon (0.92) Dish (0.60)
Wall (0.98) Dog (0.93) Cutlery (0.92) Piece de Resistance Dish (0.59)
Bottle (0.90) Plate (0.90) Alcohol (0.74) Table (0.55)
Drink (0.75) Dishware (0.90) Beverage (0.74) Furniture (0.55)
Tableware (0.65) Ingredient (0.89) Drink (0.74) Dining Table (0.54)
Counter (0.60) Recipe (0.87) Person (0.67) Dinner Table (0.53)
Dish (0.53) Houseplant (0.85) Human (0.67) Plate (0.52)
Leaf vegetable (0.83) Stew (0.65)
Kitchen Utensil (0.78) Beer (0.62)
Vegetable (0.78) Pasta (0.60)
Cooking (0.77) Curry (0.58)
Cuisine (0.76) Glass (0.58)
Companion Dog (0.75) Wine (0.58)
Broccoli (0.75) Restaurant (0.55)
Produce (0.74)
Serveware (0.74)
Garnish (0.74)
Dish (0.73)
Bowl (0.73)
Comfort Food (0.71)
Fork (0.70)
Culinary Art (0.70)