This is not India: Yemen

Image recognition (IR) systems often perform poorly once in the real world. In this post, I test four of the most popular IR systems on original real world images of food from around the world, this time from Yemen.

Key takeaway

Overall, the systems’ performances were disappointing. In one image, “Rice” was labeled by multiple systems (though not detected), but not in the other where the rice was much more visible. Labels were often too general or irrelevant. Some labels culturally misrepresented the Yemeni food and utensils as Indian or Western. One mislabeling instance was found that could disadvantage people with certain diets or religions.

Correctly predicted images 0/2
Correctly detected items 0/8
Correct labels 3/35
Potentially harmful detections/labels
0
The above table includes only detections and labels of 80%+ confidence level, for lower confidence levels see the tables further below.

Insights

The object detection feature failed to provide specific descriptions of the objects across all four systems. The descriptions that were given remained surface level (e.g. “Bowl” instead of “Bowl of Chicken”, or “Bowl” instead of “Cup of Vegetable Sauce”) and many objects simply remained undetected.

Interestingly, the labeling features of Azure and Vision gave Indian-origin labels (e.g. “Masala”) to the first image. One explanation for this could be the plate on which the meal is served, which is also commonly used in Indian cuisine (a “Thali”). Perhaps the prevalence of Indian meals in the training images (i.e. because of a higher population, more common use of English, etc.) could contribute to this misrepresentation of a Yemen meal as Indian food.

Several cultural misrepresentations were present as well. For instance, Rekognition labeled (presumably) the bowl of rice as “Oatmeal” and IBM Watson labeled the food as an “Irish Stew”. These misrepresentations should make us question if Yemeni food was represented enough in the training images.

The systems provided more correct labels on the second image (e.g. “Chicken”, “Rice”, and “Carrot”). This is an interesting outcome because, in the first image, all the items are separated in different bowls. One could assume separate items helps the systems to distinguish between the items, but this was not the case. It will be interesting to see if this happens on pictures of other countries as well, as serving food as separate items is common in many Asian countries while and less so in Western countries.

Finally, in one instance, Vision labeled the food as “Seafood”, which could disadvantage people with certain diets (e.g. pescatarian) or religions.

My recommendation

As stated in the analyses of previous countries, the object detection features need to become better in detecting all the objects as well as giving more specific descriptions. The latter is similarly true for the labeling features. Also, further attention is needed towards the idea that the recognition of Yemeni food is influenced by Indian training images. Developers should also be careful that the presentation of Yemeni food and Asian food in general (i.e. as separate items) does not impact their system’s performance. Vision appears to be very good in detecting “Carrot” (see also Vietnam), so congratulations to the developers for that.

Results

Two images of one meal from Yemen were available:

  • Meal 1: Rice, Cooked Chicken, Raw Vegetables, Vegetable Sauce (Lunch)
Object detection results*:
Ground Truth Microsoft Azure Google Vision Amazon Rekognition IBM Watson
Bowl of Rice Undetected Bowl (0.87) Undetected /
Bowl of Chicken Bowl (0.58) Food (0.78) Undetected /
Cup of Vegetable Sauce Undetected Bowl (0.80) Undetected /
Plate Tableware (0.63)

*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant

Labeling results:
MICROSOFT AZURE GOOGLE VISION AMAZON REKOGNITION IBM WATSON
Plate (0.99) Food (0.98) Bowl (0.97) Chestnut color (0.68)
Table (0.98) Tableware (0.97) Breakfast (0.91) Food (0.65)
Food (0.98) Dishware (0.88) Food (0.91) Orange Color (0.62)
Indoor (0.86) Ingredient (0.88) Produce (0.76) Beverage (0.60)
Mixture (0.72) Recipe (0.88) Meal (0.71) Nutrition (0.59)
Bowl (0.70) Cuisine (0.84) Dish (0.70) Dish (0.58)
Masala (0.66) Dish (0.81) Plant (0.67) Bowl (0.55)
Spoon (0.54) Staple Food (0.81) Oatmeal (0.60) Tableware (0.55)
Tableware (0.50) Bowl (0.79) Utensil (0.55)
Mixture (0.77) Slop Bowl (0.52)
Produce (0.75)
Serveware (0.70)
Comfort Food (0.69)
Masala (0.69)
Spoon (0.68)
Kitchen Utensil (0.68)
Metal (0.67)
Rice (0.67)
Mixing Bowl (0.66)
Breakfast (0.62)
Gravy (0.61)
South Indian Cuisine (0.61)
Tandoori Masala (0.59)
Side Dish (0.59)
Plate (0.59)

Object detection results:

Ground Truth Microsoft Azure Google Vision Amazon Rekognition IBM Watson
Rice Undetected Food (0.76) Undetected /
Chicken Undetected / Undetected /
Raw Vegetables Food (0.52) Carrot (0.66) Undetected /

Labeling results:

MICROSOFT AZURE GOOGLE VISION AMAZON REKOGNITION IBM WATSON
Food (0.90) Food (0.98) Plant (0.99) Reddish Orange Color (0.97)
Plate (0.88) Tableware (0.93) Dish (0.96) Dish (0.89)
Chicken (0.72) Ingredient (0.90) Meal (0.96) Nutrition (0.89)
Dinner (0.65) Recipe (0.87) Food (0.96) Food (0.89)
White Rice (0.60) White Rice (0.85) Vegetable (0.88) Stew (0.85)
Steamed Rice (0.59) Rice (0.82) Rice (0.84) Food Product (0.80)
Recipe (0.55) Staple Food (0.78) Bowl (0.63) Tableware (0.79)
Glutinous Rice (0.52) Cuisine (0.78) Produce (0.56) Bouilabaisse (0.71)
Dish (0.76) Curry (0.56) Irish Stew (0.68)
Produce (0.76) Curry (0.50)
Meat (0.75)
Vegetable (0.75)
Seafood (0.75)
Stew (0.73)
Jasmine Rice (0.70)
Fast Food (0.69)
Plate (0.68)
Comfort Food (0.68)
Carrot (0.65)
Baby Carrot (0.64)
Gosht (0.64)
Mixture (0.63)
Brassicales (0.63)
Take-out Food (0.63)
Leaf Vegetable (0.62)