Finally some good Curry: Myanmar

Image recognition (IR) systems often perform poorly once in the real world. In this post, I test four of the most popular IR systems on original real world images of food from around the world, this time from Myanmar.

Key takeaway

Overall, the systems performed poorly though they labeled “Rice” right (except for IBM Watson). Microsoft Azure and Google Vision were also a bit more specific by labeling “Rice and curry” right. Unfortunately, many labels were too general, irrelevant or clear misrepresentations. Microsoft Azure and Google vision also mislabeled the places of origin, which could be sensitive and harmful to some.

Correctly predicted images 0/3
Correctly detected items 0/8
Correct labels 6/56
Potentially harmful detections/labels
4
The above table includes only detections and labels of 80%+ confidence level, for lower confidence levels see the tables further below.

Insights

The object detection features performed very poorly on the selected images: not a single item was correctly identified. Most of the items remained undetected, and the systems identified only a handful in a general manner. In two instances, the detected items were (cultural) misrepresentations (e.g. “popcorn” instead of “rice”) that could lead to harm. This leads me to conclude that the object detection features are, in this case, severely lacking.

The labeling features identified more items, but conversely also (culturally) misrepresented many more items, which could lead to harm. “Rice” appeared easy to detect by all systems except Watson. Azure and Vision were also able to more specifically detect “Rice and curry” as well as “Steamed Rice”. These two systems also identified multiple varieties of rice (e.g. Jasmine), although the rice was actually simply a local variety of Myanmarese rice. Of course, it would be very difficult even for humans to identify the variety of rice based on these pictures.

The systems, especially Vision, also suggested a lot more labels than, for example, for Belgium. This may correlate with why more labels were correctly identified (as described in the previous paragraph).

Unfortunately, more also labels seem to come with more (cultural) misrepresentations. While for Belgium these misrepresentations mainly consisted of wrongly naming a dish or ingredient, the case for Myanmar seems more severe, especially with Azure and Vision. For both these systems, dishes were given a (wrong) place of origin (e.g. “Sri Lankan Cuisine”, “Chinese food”, “Japanese curry”, “Takikomi Gohan”, etc.). Depending on the context, these misrepresentations could become sensitive and harmful to some (e.g. cultural appropriation). Of course, the presented meals were quite common across different countries and cultures, which could mitigate harm. Also, the confidence rates were generally quite low (between 0.5 and 0.6) for these types of predictions.

Finally, the second meal included two images. Strangely, the systems performed quite different on both of these images for object detection as well as labeling. Different objects were detected and significantly different labels were given. Of course, in one image the Chicken Chili Curry with Mango Salad was on the rice itself while in the other it was still in a plastic delivery bag (difficult to even recognize for humans). This could have had an influence on the different labels. However, this would not explain why Vision had a significantly lower detection rate (-0.16) for “Rice” in the image where the rice was clearly more visible.

My recommendation

Providing more labels perhaps comes with more correct predictions, but also with many more wrong predictions and misrepresentations. Developers should find a balance between the two. Developers should also be careful to provide origins (e.g. “Sri Lankan cuisine”) of the meals as they, in this case, clearly did not match, leading to cultural misrepresentation. As was the case for Belgium, predictions should become more specific as they currently often miss a lot of nuance.

Results

Images of two different meals from Myanmar were available:

  • Meal 1: Rice with Fish Curry (lunch)
  • Meal 2: Rice with Chicken Chili Curry (lunch)
Object detection results*:
Ground Truth Microsoft Azure Google Vision Amazon Rekognition IBM Watson
Bowl of Rice Tableware(0.772) Food (0.69) Undetected /
Fish Curry / / / /

*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant

Labeling results:
MICROSOFT AZURE GOOGLE VISION AMAZON REKOGNITION IBM WATSON
Food (0.99) Food (0.98) Plant (0.98) Nutrition (0.89)
Plate (0.98) White rice (0.94) Food (0.89) Food (0.89)
Jasmine rice (0.92) Tableware (0.90) Produce (0.89) Dish (0.89)
Indoor (0.91) Jasmine Rice (0.90) Vegetable (0.89) Food product (0.80)
White rice (0.90) Rice (0.88) Dish (0.83) Tableware (0.79)
Steamed Rice (0.88) Staple Food (0.88) Dish (0.83) Porcupine ball (0.72)
Rice (0.82) Ingredient (0.88) Meal (0.83) Fried rice (0.66)
Rice and curry (0.78) Recipe (0.87) Lentil (0.72) Fried Calamari (0.50)
Arborio rice (0.65) Glutinous Rice (0.87) Bean (0.72)
Sri Lankan Cuisine (0.63) Basmati (0.86) Sweets (0.58)
Japanese curry (0.56) Cuisine (0.79) Confectionery (0.58)
Takikomi Gohan (0.55) Steamed Rice (0.78) Breakfast (0.57)
Spiced rice (0.52) Produce (0.78)
Dish (0.76)
Arborio Rice (0.76)
Xôi (0.74)
Comfort Food (0.70)
Chana Masala (0.70)
vegetable (0.70)
Meat (0.68)
Rice and Curry (0.66)
Stew (0.66)
Ghungi (0.61)
Indian Cuisine (0.57)
Koresh (0.55)

Object detection results:

Ground Truth Microsoft Azure Google Vision Amazon Rekognition IBM Watson
Bowl of Rice Undetected Bowl (0.67) Undetected /
Chicken Chili Cury with Mango Salad Undetected Food (0.68)) Undetected /
Garlic Undetected Undetected Undetected /
Spoon Undetected Undetected Undetected /

Labeling results:

MICROSOFT AZURE GOOGLE VISION AMAZON REKOGNITION IBM WATSON
Food (0.99) Food (0.98) Plant (0.99) Pale yellow color (0.77)
Table (0.98) Tableware (0.92) Vegetable (0.92) Utensil (0.71)
Plate (0.97) Staple food (0.88) Food (0.96) Spoon (0.70)
Bowl (0.77) Ingredient (0.88) Produce (0.96) Emerald color (0.68)
Fast food (0.71) Recipe (0.88) Sprout (0.80) Ladle (0.61)
Dish (0.51) Cuisine (0.86) Bean Sprout (0.61) Food product (0.60)
Mixture (0.83) Grain (0.60) Food (0.60)
Dish (0.80) Lentil (0.59) Scoop (0.60)
Produce (0.77) Bean (0.59 Tableware (0.59)
Rice (0.70) Rice (0.59) Tablespoon (0.57)
Comfort Food (0.67) Meal (0.58)
Superfood (0.67)
Bowl (0.65)
Spoon (0.64)
Cutlery (0.64)
Stuffing (0.64)
Kitchen Utsensil (0.63)
Meat (0.61)
Cooking (0.58)
Food Additive (0.58)
Break cereal (0.57)
Fast Food (0.56)
Vegetable (0.55)
Chinese Food (0.55)
Thai Food (0.55)

Object detection results:

Ground Truth Microsoft Azure Google Vision Amazon Rekognition IBM Watson
Bowl of Rice Popcorn (0.78) Packaged goods (0.69) Undetected /
Chicken Chili Cury with Mango Salad Undetected Packaged goods (0.90) Pineapple /

Labeling results:

MICROSOFT AZURE GOOGLE VISION AMAZON REKOGNITION IBM WATSON
Table (0.99) Food (0.97) Food (0.94) Alabaster color (1)
Food (0.99) Ingredient (0.90) Rice (0.84) Shellfish (0.55)
Plate (0.93) Cuisine (0.86) Produce (0.61) Invertebrate (0.55)
Recipe (0.86) Pineapple (0.61) Animal (0.55)
Dish (0.85) Fruit (0.61) Seasnail (0.55)
Staple food (0.85) Gastropod (0.55)
Tableware (0.73) Common limpet (0.54)
Chemical Compound (0.69) Succulent (0.53)
Vegetable (0.67) Plant (0.53)
Comfort Food (0.65) Feather ball (0.53)
Plant (0.62)
Produce (0.59)
Oven Bag (0.58)
Fashion Accessory (0.58)
Jasmine Rice (0.57)
Rice (0.54)
Dairy (0.52)