Guinea Pig Pizza: Ecuador

Image recognition (IR) systems often perform poorly once in the real world. In this post, I test four of the most popular IR systems on original real world images of food from around the world, this time from Ecuador.

Key takeaway

The IR systems performed very poorly for both object detection and labeling. Cuy was not recognized at all across four images. Several (cultural) misrepresentations were present.

Correctly predicted images 0/4
Correctly detected items 0/4
Correct labels 0/89
Potentially harmful detections/labels
6
The above table includes only detections and labels of 80%+ confidence level, for lower confidence levels see the tables further below.

Insights

Object Detection

The object detection systems failed to accurately describe the Cuy in any of the four images. Vision gave the general description of Food for the Cuy in all images, while Rekognition gave the description of Pizza for the Cuy in all four images. As such, object detection performed very poorly.

Labeling

The labeling systems performed very poorly as well. The labels given remained surface level and seemed to not even come close to describing the meal. Perhaps the most relevant label was Fried Food. This brings to question the usefulness of the results for the presented meal.

Furthermore, several cultural misrepresentations were present, the most obvious one being Rekognition consistently mistaking the Cuy for pizza. With labels such as Hendl and Britisch cuisine, Vision also gave descriptions that significantly mispresent the meal.

As in previous analyses, here too we have to address confusion (mostly by Vision) between meats. While a typical dish in Ecuador and neighboring countries, people from other cultures might prefer not to eat Cuy. Yet, the systems described the meal as chicken meat, duck meat, turkey meat, pork, etc. If someone would overly rely on the results of these labeling systems, they would perhaps eat Cuy while thinking it would be something else.

Finally, we see that Rekogntion primarily have labels for the laptop in the background. While obviously not wrong, I wonder if Rekognition found it easier to present results of something more common and visually simple/distinctive, and thereby failed to give a lot of results for the food.

Suggestions for improvement

  • Provide more specific and relevant labels for Cuy;
  • Address (cultural) misrepresentations (i.e. Cuy is not pizza, );
  • Make sure labels of meat do not harm people of certain religions or with certain diets (i.e. Cuy is not duck meat or chicken meat).
  • Check in how far the systems can distinguish between less relevant, yet visually simple background objects and meals in the foreground (especially for Rekognition).

Results

Four images of one meal from Ecuador were available:

  • Meal 1: Cuy (Fried Guinea pig) (Lunch)

Object detection results.

GROUND TRUTH MICROSOFT AZURE GOOGLE VISION AMAZON REKOG. IBM WATSON
Cuy Undetected Food (0.66) Pizza (0.74) /

*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant

Labeling Results:

MICROSOFT AZURE GOOGLE VISION AMAZON REKOG. IBM WATSON
  Food (0.98)
Computer Keyboard (0.98)
nutrition (0.95)
  Laptop (0.94) Hardware (0.98) food (0.95)
  Lechona (0.9) Keyboard (0.98)
reddish orange color (0.86)
  Computer (0.89)
Computer Hardware (0.98)
dish (0.83)
  Ingredient (0.88) Computer (0.98)
light brown color (0.83)
  Tableware (0.88) Electronics (0.98) Apple Pie (0.78)
  Recipe (0.86) Pc (0.97) dessert (0.78)
  Chicken meat (0.8) Laptop (0.94)
fish and chips (0.67)
  Fried food (0.8) Food (0.79) turnover (0.51)
  Roasting (0.79) Pizza (0.74) samosa (0.5)
  Cuisine (0.79)    
  Cooking (0.78)    
  Duck meat (0.78)    
  Produce (0.76)    
  Turkey meat (0.75)    
  Dish (0.75)    
  Meat (0.74)    
  Drunken chicken (0.73)    
  Vegetable (0.71)    
  Personal computer (0.7)    
  Fast food (0.7)    
  Comfort food (0.66)    
  Pork (0.66)    
  Hendl (0.63)    
  Flesh (0.63)    

Object detection results.

GROUND TRUTH MICROSOFT AZURE GOOGLE VISION AMAZON REKOG. IBM WATSON
Cuy Undetected Food (0.73) Pizza (0.84) /

*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant

Labeling Results:

MICROSOFT AZURE GOOGLE VISION AMAZON REKOG. IBM WATSON
food_grilled (0.69) Food (0.98) Pc (0.97)
light brown color (0.91)
  Tableware (0.93) Electronics (0.97) nutrition (0.87)
  Laptop (0.88) Computer (0.97) food (0.87)
  Ingredient (0.88) Food (0.91)
fish and chips (0.87)
  Recipe (0.87) Laptop (0.88) dish (0.87)
  Computer (0.84) Pizza (0.84)
food product (0.79)
  Chicken meat (0.84)
Computer Keyboard (0.83)
 
  Deep frying (0.83) Hardware (0.83)  
  Cuisine (0.81) Keyboard (0.83)  
  Dish (0.78)
Computer Hardware (0.83)
 
  Drunken chicken (0.78)    
  Plate (0.78)    
  Produce (0.77)    
  Fried food (0.77)    
  Vegetable (0.76)    
  Cooking (0.75)    
  Hendl (0.75)    
  Meat (0.74)    
  Seafood (0.72)    
  Roasting (0.72)    
  Comfort food (0.7)    
  Fast food (0.7)    
  Duck meat (0.69)    
  Frying (0.67)    
  British cuisine (0.66)    

Object detection results.

GROUND TRUTH MICROSOFT AZURE GOOGLE VISION AMAZON REKOG. IBM WATSON
Cuy Undetected Food (0.77) Pizza (0.91) /

*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant

Labeling Results:

MICROSOFT AZURE GOOGLE VISION AMAZON REKOG. IBM WATSON
  Food (0.98) Pc (0.99)
reddish orange color (0.95)
  Computer (0.98) Computer (0.99) nutrition (0.75)
  Laptop (0.95) Electronics (0.99) food (0.75)
  Tableware (0.93) Laptop (0.99) dish (0.75)
 
Personal computer (0.93)
Computer Keyboard (0.97)
fish and chips (0.75)
  Ingredient (0.89) Hardware (0.97)  
  Recipe (0.86) Keyboard (0.97)  
  Input device (0.84)
Computer Hardware (0.97)
 
  Cuisine (0.83) Pizza (0.91)  
  Dish (0.83) Food (0.91)  
  Fast food (0.79)    
  Chicken meat (0.79)    
  Peripheral (0.77)    
  Fried food (0.76)    
  Produce (0.75)    
  Netbook (0.74)    
  Space bar (0.74)    
  Meat (0.74)    
  Drunken chicken (0.74)    
  Output device (0.73)    
  Cooking (0.72)    
  Junk food (0.71)    
  Comfort food (0.69)    
  Baked goods (0.68)    
  Touchpad (0.66)    

Object detection results.

GROUND TRUTH MICROSOFT AZURE GOOGLE VISION AMAZON REKOG. IBM WATSON
Cuy Undetected Food (0.77) Pizza (0.62) /

*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant

Labeling Results:

MICROSOFT AZURE GOOGLE VISION AMAZON REKOG. IBM WATSON
food_grilled (0.67) Food (0.98)
Computer Keyboard (0.99)
fish and chips (0.92)
  Computer (0.98) Hardware (0.99) dish (0.92)
  Laptop (0.96) Keyboard (0.99) nutrition (0.92)
 
Personal computer (0.95)
Computer Hardware (0.99)
food (0.92)
  Ingredient (0.88) Computer (0.99)
reddish orange color (0.79)
  Input device (0.88) Electronics (0.99)
light brown color (0.57)
  Recipe (0.87) Pc (0.98)  
  Tableware (0.83) Laptop (0.96)  
  Output device (0.82) Food (0.78)  
  Chicken meat (0.81) Pizza (0.62)  
  Cuisine (0.78) Pork (0.58)  
  Cooking (0.78)    
  Office equipment (0.76)    
  Fried food (0.76)    
  Space bar (0.75)    
  Produce (0.75)    
  Dish (0.75)    
  Meat (0.74)    
  Roasting (0.73)    
  Plate (0.72)    
  Laptop part (0.71)    
  Deep frying (0.71)    
  Fried chicken (0.7)    
  Comfort food (0.68)    
 
Computer hardware (0.68)