A bit more Wine, Please: Germany
Image recognition (IR) systems often perform poorly once in the real world. In this post, I test four of the most popular IR systems on original real world images of food from around the world, this time from Germany.
Key takeaway
Overall, the systems’ performances were disappointing. On the positive side, Amazon Rekognition did correctly label “Alcohol”, which is great because of alcohol’s sensitive nature in some contexts. Other than that, the systems failed to detect many objects, failed to provide specific labels, and presented many irrelevant labels.
Correctly predicted images | 0/2 |
Correctly detected items | 1/15 |
Correct labels | 1/38 |
Potentially harmful detections/labels |
0
|
Insights
The object detection feature failed to provide specific descriptions of the objects across all four systems. The descriptions that were given remained surface level (e.g. “Food” instead of “Minestrone”, or “Bottle” instead of “Bottle of Wine”) and many objects simply remained undetected. In the case of Vision, the bounding boxes’ positions of the object detection feature were all scrambled and unintelligible (see picture below). It’s unclear why the bounding boxes showed up like this.
As was the case for the previous countries, the labeling feature performed slight better, but still unsatisfactory. Amazon Rekognition provided the label “Alcohol” and “Wine” which, given the sensitive nature of alcohol in certain contexts, works well. Unfortunately, the other three systems failed to label these.
Except for “Alcohol” and “Wine”, other labels remained very unspecific and uninformative. For instance, “Food”, “Plate”, “Dish”, “Kitchen Utensil” and such were common. None detected the olives, pesto, or Minestrone. In one case, the Minestrone was labeled as “Curry”, which could be considered a (cultural) misrepresentation.
My recommendation
As for the previous analyses, both the object detection and labeling feature need much improvement. For object detection, this means being able to detect the various items in the first place, and giving more specific description in the second place. The labeling feature of Amazon Rekognition correctly identified “Alcohol” and “Wine” – both sensitive items in certain contexts -, and the other three systems would perhaps do well to also implement the identification of “Alcohol”.
Results
Two images of one meal from Germany were available:
- Meal 1: Minestrone with Pesto and Olives, and a glass of Wine (Dinner)
Object detection results*:
Ground Truth | Microsoft Azure | Google Vision | Amazon Rekognition | IBM Watson |
---|---|---|---|---|
Bowl of Minestrone | Bowl (0.66) | Scrambled* | Undetected | / |
Bowl of Pesto | Bowl (0.58) | Scrambled* | Undetected | / |
Spoon | Undetected | Scrambled* | Undetected | / |
Glass of wine | Undetected | Scrambled* | Undetected | / |
Cup of Olives | Undetected | Scrambled* | Undetected | / |
Wine Bottle | Bottle (0.56) | Scrambled* | Undetected | / |
Spoon | Undetected | Scrambled* | Undetected | / |
*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant
Labeling results:
MICROSOFT AZURE | GOOGLE VISION | AMAZON REKOGNITION | IBM WATSON |
---|---|---|---|
Food (0.98) | Food (0.98) | Dish (0.95) | Olive Green Color (0.80) |
Plate (0.89) | Tableware (0.97) | Meal (0.95) | Orange Color (0.66) |
Ingredient (0.90) | Food (0.95) | Food (0.65) | |
Dishware (0.89) | Plant (0.73) | Dish (0.65) | |
Recipe (0.88) | Pottery (0.68) | Nutrition (0.65) | |
Serveware (0.85) | Alcohol (0.58) | Tableware (0.60) | |
Kitchen Utensil (0.84) | Beverage (0.58) | Side Dish (0.59) | |
Cuisine (0.83) | Drink (0.58) | Curry (0.51) | |
Dish (0.82) | Garnish (0.50) | ||
Rectangle (0.80) | |||
Vegetable (0.78) | |||
Leaf Vegetable (0.78) | |||
Soup (0.77) | |||
Produce (0.74) | |||
Mixture (0.74) | |||
Curry (0.70) | |||
Comfort Food (0.69) | |||
Spoon (0.69) | |||
Yellow Curry (0.68) | |||
Cutlery (0.68) | |||
Circle (0.68) | |||
Garnish (0.68) | |||
Condiment (0.67) | |||
Plate (0.65) | |||
Stew (0.65) |
Object detection results:
Ground Truth | Microsoft Azure | Google Vision | Amazon Rekognition | IBM Watson |
---|---|---|---|---|
Bowl of Minestrone | Undetected | Food (0.78) | Undetected | / |
Bowl of Minestrone | Bowl (0.75) | Food (0.68) | Undetected | / |
Bowl of Pesto | Kitchen Utensil (0.52) | Food (0.55) | Undetected | / |
Spoon | Undetected | Undetected | Undetected | / |
Glass of wine | Undetected | Undetected | Undetected | / |
Glass of wine | Cup (0.58) | Undetected | Undetected | / |
Wine Bottle | Bottle (0.84) | Packaged Goods | Beer | / |
Spoon | Undetected | Undetected | Spoon | / |
Labeling results:
MICROSOFT AZURE | GOOGLE VISION | AMAZON REKOGNITION | IBM WATSON |
---|---|---|---|
Table (0.99) | Food (0.99) | Dish (0.99) | Nutrition (0.74) |
Plate (0.99) | Tableware (0.97) | Meal (0.99) | Food (0.74) |
Food (0.99) | Table (0.95) | Food (0.99) | Charcoal Color (0.62) |
Indoor (0.98) | Bottle (0.94) | Spoon (0.92) | Dish (0.60) |
Wall (0.98) | Dog (0.93) | Cutlery (0.92) | Piece de Resistance Dish (0.59) |
Bottle (0.90) | Plate (0.90) | Alcohol (0.74) | Table (0.55) |
Drink (0.75) | Dishware (0.90) | Beverage (0.74) | Furniture (0.55) |
Tableware (0.65) | Ingredient (0.89) | Drink (0.74) | Dining Table (0.54) |
Counter (0.60) | Recipe (0.87) | Person (0.67) | Dinner Table (0.53) |
Dish (0.53) | Houseplant (0.85) | Human (0.67) | Plate (0.52) |
Leaf vegetable (0.83) | Stew (0.65) | ||
Kitchen Utensil (0.78) | Beer (0.62) | ||
Vegetable (0.78) | Pasta (0.60) | ||
Cooking (0.77) | Curry (0.58) | ||
Cuisine (0.76) | Glass (0.58) | ||
Companion Dog (0.75) | Wine (0.58) | ||
Broccoli (0.75) | Restaurant (0.55) | ||
Produce (0.74) | |||
Serveware (0.74) | |||
Garnish (0.74) | |||
Dish (0.73) | |||
Bowl (0.73) | |||
Comfort Food (0.71) | |||
Fork (0.70) | |||
Culinary Art (0.70) |