There’s more than Rice: Malaysia

Image recognition (IR) systems often perform poorly once in the real world. In this post, I test four of the most popular IR systems on original real world images of food from around the world, this time from Malaysia.

Key takeaway

Overall, the systems’ performances were very disappointing. Object detection was not able to accurately detect any part of a meal, while labeling detected rice and a fork, but was too general for other items. One instance of cultural misrepresentation was also found.

Correctly predicted images 0/1
Correctly detected items 0/7
Correct labels 2/19
Potentially harmful detections/labels
0
The above table includes only detections and labels of 80%+ confidence level, for lower confidence levels see the tables further below.

Insights

The object detection features performed very poorly on the selected image. Only the very general terms of “Food” and “Tableware” were used to describe items such as “Bowl of Rice”, “Steamed Egg”, “Spoon” by Vision. Furthermore, “Cucumber Soup” was detected as “Plate” and “Chinese Iced Tea” as “Tableware”. All other systems failed to detect anything.

Vision’s labeling feature performed better, but still unsatisfactory. For instance, it labeled “White Rice”, “Fork”, “Steamed Rice” and “Meat”, but failed to detect the “Chinese Iced Tea”, “Cucumber Soup”, “Steamed Egg”, etc. The other systems performed much worse, with only being able to label in a very general manner (e.g. “Food”, “Plate”, “Tableware”, etc.).

Watson also culturally misrepresented the food and labeled it as “Taco”.

My recommendation

Developers should use more specific labels in order for the system to be useful. As the object detection system did not work well at all, a deeper analysis of what went wrong is needed.

Results

An image of one meal from Malaysia was available:

  • Meal 1: Rice, Steamed Egg, Sweet and Sour Pork, Cucumber Soup, Chinese Iced Tea (Dinner)
Object detection results*:
Ground Truth Microsoft Azure Google Vision Amazon Rekognition IBM Watson
Bowl of Rice Undetected Food (0.73) Undetected /
Steamed Egg Undetected Food (0.72) Undetected /
Sweet and Sour Pork Undetected / Undetected /
Spoon Undetected Tableware (0.56) Undetected /
Cucumber Soup Undetected Plate (0.84) Undetected /
Chinese Iced Tea Undetected Tableware (0.76) Undetected /
Fork Undetected Undetected Undetected /

*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant

Labeling results:
MICROSOFT AZURE GOOGLE VISION AMAZON REKOGNITION IBM WATSON
Food (0.99) Food (0.99) Plant (0.90) Nutrition (0.71)
Table (0.96) Tableware (0.97) Food (0.90) Food (0.71)
Plate (0.95) Ingredient (0.91) Breakfast (0.87) Dish (0.71)
Tableware (0.79) White Rice (0.90) Bowl (0.83) Taco (0.70)
Breakfast (0.66) Recipe (0.88) Meal (0.81) Shop (0.68)
Dish (0.61) Cuisine (0.86) Vegetation (0.77) Retail Store (0.68)
Fast food (0.60) Plate (0.86) Dish (0.69) Building (0.68)
Dinner (0.57) Kitchen Utensil (0.86) Lunch (0.66) Reddish Brown Color (0.66)
Bowl (0.52) Fork (0.85) Produce (0.65) Light Brown Color (0.64)
Dish (0.85) Vegetable (0.57) Food Product (0.60)
Nasi Kandar (0.81)
Produce (0.79)
Staple Food (0.76)
Jasmine Rice (0.74)
Meat (0.72)
Steamed Rice (0.71)
Serveware (0.69)
Vegetable (0.69)
Comfort Food (0.68)
Rice (0.67)
Fried Food (0.67)
Lime (0.66)
Condiment (0.66)
Hayashi Rice (0.63)
Papadum (0.61)

Where are the Chopsticks? Vietnam

Image recognition (IR) systems often perform poorly once in the real world. In this post, I test four of the most popular IR systems on original real world images of food from around the world, this time from Vietnam.

Key takeaway

Overall, the systems’ performances were disappointing. Object detection failed miserably across all four systems. Labeling worked slightly better, though rice was labeled in the first meal, but generally not in the second two meals. The systems overall missed a lot of nuances and provided predictions that could harm religious people and vegetarians/vegans.

Correctly predicted images 0/4
Correctly detected items 1/16
Correct labels 9/95
Potentially harmful detections/labels
0
The above table includes only detections and labels of 80%+ confidence level, for lower confidence levels see the tables further below.

Insights

The object detection features performed very poorly on the selected images: only one piece of carrot and one spoon was correctly identified (only be Azure and Rekognition, respectively), other items remained largely undetected (especially by Rekognition) or were described in a too general manner.

As usual, the labeling features identified more items, but were also severely lacking in detail. For the first meal, the rice and vegetables were detected, though only Vision detected both. Nuances such “fried vegetables” or “mixed vegetables” were not present. The soup was detected by Vision and Watson, though did not recognize it as “Bok Choy Soup”. Rekognition was able to label a spoon, which other systems missed. The chopsticks were not labeled.

For the second meal, rice was only detected by Vision and not by other systems, which is surprising as rice seemed easily detectable in the previous meal and other cases. Perhaps this was due to the rather unusual and cut-off position the rice was in, though this is just a guess. Is is common, Vision provided more labels, but this led to clear (cultural) misrepresentations (e.g. labels of “Guk” and “Sauerkraut”). One misrepresentation of Vision could, in some contexts, also cause harm to religious people or vegetarians/vegans: mock meat was labeled as meat with a moderately high confidence score (0.73).

Finally, for the third meal, only “Tomato”, “Iceburg Lettuce” [sic], and “Grilling were identified, the chicken and rice were not. As with the mock meat, the chicken was actually labeled as both “Beef” and “Steak”, which could cause harm or confusion to religious people. Also, as was the case in an image of the first meal, the chopsticks were not labeled. It will be interesting to see (in future analysis of other countries) if this is due to cultural bias or if this is simply a coincidence. Rice was strangely not detected in the image though it is clearly visibele, perhaps this is due to it’s position in the image.

My recommendation

As in previous analysis, I recommend developers use more specific labels and make sure that the specific labels they use don’t cause any misrepresentation. In this case, mock meat and types of real meats were interchanged. Unfortunately, this could harm religious people and vegetarians/vegans. Developers should also be sure to check if their systems work properly on chopsticks and not just spoons and forks, though further analysis is needed for this. Finally, Vision developers should probably fix the typo in the “Iceburg Lettuce” label.

Results

Images of three different meals from Vietnam were available:

  • Meal 1: Rice, Bok Choy Soup and Fried Mixed Vegetables (Dinner)
  • Meal 2: Rice, Cucumber Soup, Mock Meat and Lettuce (Lunch)
  • Meal 3: Rice, Grilled Chicken, and raw vegetables (Dinner)
Object detection results*:
Ground Truth Microsoft Azure Google Vision Amazon Rekognition IBM Watson
Bowl of Rice Undetected Food (0.77) Undetected /
Fried Mixed Vegetables Carrot (0.66) Food (0.72) Undetected /

*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant

Labeling results:
MICROSOFT AZURE GOOGLE VISION AMAZON REKOGNITION IBM WATSON
Food (0.99) Food (0.99) Plant (0.99) Olive Green Color (0.96)
Rice (0.98) Tableware (0.96) Vegetables (0.96) Grain (0.94)
Jasmine Rice (0.75) White Rice (0.92) Food (0.96) Food Product (0.94)
White Rice (0.73) Ingredient (0.91) Produce (0.92) Food (0.94)
Steamed Rice (0.70) Staple Food (0.88) Bowl (0.57) Rice (0.83)
Glutinous Rice (0.66) Recipe (0.88) Cutlery (0.57) White Rice (0.80)
Cooking (0.60) Jasmine Rice (0.87) Wilde Rice (0.62)
Ingredient (0.58) Cuisine (0.87)
Cuisine (0.56) Rice (0.86)
Cheese (0.55) Dish (0.86)
Homemade (0.52) Produce (0.79)
Vegetable (0.77)
Steamed Rice (0.76)
Garnish (0.75)
Bowl (0.75)
Plate (0.75)
Lead Vegetable (0.73)
Salad (0.69)
Basmati (0.69)
Glutinous Rice (0.67)
Prepackaged Meal (0.67)
Carrot (0.66)
Brassicales (0.65)
Meal (0.62)
Supper (0.61)

Object detection results:

Ground Truth Microsoft Azure Google Vision Amazon Rekognition IBM Watson
Rice Bowl Food (0.77) Undetected /
Fried Mixed Vegetables Carrot (0.53) Undetected Undetected /
Chopsticks Kitchen Utensil (0.70) Undetected Undetected /
Spoon Kitchen Utensil (0.71) Undetected Undetected /
Bok Choy soup Bowl Food (0.75) Undetected /

Labeling results:

MICROSOFT AZURE GOOGLE VISION AMAZON REKOGNITION IBM WATSON
Food (0.99) Food (0.98) Plant (0.99) Olive Green Color (0.96)
Plate (0.89) Tableware (0.96) Produce (0.98) Dish (0.91)
Bowl (0.86) Ingredient (0.92) Food (0.98) Nutrition (0.91)
Vegetable (0.75) Staple Food (0.88) Vegetable (0.95) Food (0.91)
Meal (0.53) Recipe (0.88) Dish (0.90) Greenishness Color (0.85)
White Rice (0.88) Meal (0.90) Food Product (0.80)
Rice (0.86) Bowl (0.90) Utensil (0.80)
Cuisine (0.86) Sprout (0.68) Salad (0.69)
Leaf Vegetable (0.85) Seasoning (0.57) Seaweed Salad (0.69)
Dish (0.84) Soup (0.68)
Jasmine Rice (0.79)
Produce (0.78)
Vegetable (0.77)
Soup (0.76)
Bowl (0.76)
Mixing Bowl (0.73)
Steamed Rice (0.70)
Plate (0.70)
Comfort Food (0.69)
Dishware (0.68)
Stock (0.68)
Cooking (0.67)
Namul (0.67)
Garnish (0.66)
Kitchen Utensil (0.66)

Object detection results:

Ground Truth Microsoft Azure Google Vision Amazon Rekognition IBM Watson
Rice Undetected Food (0.75) Undetected /
Spoon Kitchen Utensil (0.78) Undetected Spoon /
Lettuce Undetected Undetected Undetected /
Mock Meat Food (0.52) Food (0.78) Undetected /
Mock Meat Undetected Food (0.74) Undetected /
Cucumber Soup Bowl (0.67) Food (0.78) Undetected /

Labeling results:

MICROSOFT AZURE GOOGLE VISION AMAZON REKOGNITION IBM WATSON
Food (0.99) Food (0.99) Dish (0.95) Olive Green Color (0.86)
Plate (0.99) Tableware (0.97) Meal (0.95) Food (0.85)
Table (0.98) Ingredient (0.91) Food (0.95) Nutrition (0.85)
Salad (0.74) Recipe (0.88) Plant (0.94) Dish (0.85)
Broccoli (0.72) Staple Food (0.88) Bowl (0.93) Food Product (0.79)
Container (0.54) Fines Herbes (0.86) Spoon (0.90) Side Dish (0.76)
Dinner (0.52) Dishware (0.86) Cutlery (0.90) Bottle Green Color (0.51)
Cruciferous Vegetable (0.50) Cuisine (0.85) Vegetable (0.71) Mushy Peas (0.50)
Dish (0.84) Produce (0.69)
Leaf Vegetable (0.84) Seasoning (0.57)
White Rice (0.82)
Produce (0.80)
Vegetable (0.79)
Plate (0.78)
Bowl (0.78)
Jasmine Rice (0.76)
Glutinous Rice (0.75)
Meat (0.73)
Rice (0.73)
Comfort Food (0.72)
Guk (0.72)
Sauerkraut (0.71)
Soup (0.70)
Basmati (0.70)
Garnish (0.69)

Object detection results:

Ground Truth Microsoft Azure Google Vision Amazon Rekognition IBM Watson
Bowl of Rice Bowl (0.69) Tableware (0.76) Undetected /
Bowl of Rice Bowl (0.69) Tableware (0.66) Undetected /
Raw Vegetables Undetected Food (0.71) Undetected /
Chopsticks Undetected Undetected Undetected /
Grilled Chicken Undetected Food (0.73) Undetected /

Labeling results:

MICROSOFT AZURE GOOGLE VISION AMAZON REKOGNITION IBM WATSON
Food (0.99) Food (0.99) Plant (0.97) Nutrition (0.86)
Fast Food (0.90) Tableware (0.94) Meal (0.91) Food (0.86)
Healthy (0.89) Ingredient (0.91) Food (0.91) Dish (0.86)
Recipe (0.79) Recipe (0.88) Dish (0.89) Chestnut Red Color (0.80)
Fresh (0.78) Plate (0.83) Vegetable (0.81) Chestnut Color (0.75)
Fruit (0.78) Cuisine (0.82) Produce (0.71) Teriyaki (0.73)
Tomato (0.69) Dish (0.82) Bowl (0.70) Sukiyaki (0.55)
Delicious (0.65) Leaf Vegetable (0.81) Dinner (0.57) Barbecued Spareribs (0.50)
Container (0.63) Vegetable (0.79) Supper (0.57)
Broccoli (0.61) Produce (0.78) Vase (0.57)
Carrot (0.58) Garnish (0.76) Pottery (0.57)
Ingredient (0.57) Natural Foods (0.75) Jar (0.57)
Dish (0.54) Beef (0.74)
Salad (0.54) Steak (0.74)
Produce (0.54) Cooking (0.74)
Tasty (0.54) Meat (0.73)
Diet (0.52) Citrus (0.73)
Iceburg Lettuce (0.70)
Comfort Food (0.69)
Roasting (0.68)
Grilling (0.68)
Lime (0.67)
Fast Food (0.67)
Lemon (0.67)
Fruit (0.66)

Finally some good Curry: Myanmar

Image recognition (IR) systems often perform poorly once in the real world. In this post, I test four of the most popular IR systems on original real world images of food from around the world, this time from Myanmar.

Key takeaway

Overall, the systems performed poorly though they labeled “Rice” right (except for IBM Watson). Microsoft Azure and Google Vision were also a bit more specific by labeling “Rice and curry” right. Unfortunately, many labels were too general, irrelevant or clear misrepresentations. Microsoft Azure and Google vision also mislabeled the places of origin, which could be sensitive and harmful to some.

Correctly predicted images 0/3
Correctly detected items 0/8
Correct labels 6/56
Potentially harmful detections/labels
4
The above table includes only detections and labels of 80%+ confidence level, for lower confidence levels see the tables further below.

Insights

The object detection features performed very poorly on the selected images: not a single item was correctly identified. Most of the items remained undetected, and the systems identified only a handful in a general manner. In two instances, the detected items were (cultural) misrepresentations (e.g. “popcorn” instead of “rice”) that could lead to harm. This leads me to conclude that the object detection features are, in this case, severely lacking.

The labeling features identified more items, but conversely also (culturally) misrepresented many more items, which could lead to harm. “Rice” appeared easy to detect by all systems except Watson. Azure and Vision were also able to more specifically detect “Rice and curry” as well as “Steamed Rice”. These two systems also identified multiple varieties of rice (e.g. Jasmine), although the rice was actually simply a local variety of Myanmarese rice. Of course, it would be very difficult even for humans to identify the variety of rice based on these pictures.

The systems, especially Vision, also suggested a lot more labels than, for example, for Belgium. This may correlate with why more labels were correctly identified (as described in the previous paragraph).

Unfortunately, more also labels seem to come with more (cultural) misrepresentations. While for Belgium these misrepresentations mainly consisted of wrongly naming a dish or ingredient, the case for Myanmar seems more severe, especially with Azure and Vision. For both these systems, dishes were given a (wrong) place of origin (e.g. “Sri Lankan Cuisine”, “Chinese food”, “Japanese curry”, “Takikomi Gohan”, etc.). Depending on the context, these misrepresentations could become sensitive and harmful to some (e.g. cultural appropriation). Of course, the presented meals were quite common across different countries and cultures, which could mitigate harm. Also, the confidence rates were generally quite low (between 0.5 and 0.6) for these types of predictions.

Finally, the second meal included two images. Strangely, the systems performed quite different on both of these images for object detection as well as labeling. Different objects were detected and significantly different labels were given. Of course, in one image the Chicken Chili Curry with Mango Salad was on the rice itself while in the other it was still in a plastic delivery bag (difficult to even recognize for humans). This could have had an influence on the different labels. However, this would not explain why Vision had a significantly lower detection rate (-0.16) for “Rice” in the image where the rice was clearly more visible.

My recommendation

Providing more labels perhaps comes with more correct predictions, but also with many more wrong predictions and misrepresentations. Developers should find a balance between the two. Developers should also be careful to provide origins (e.g. “Sri Lankan cuisine”) of the meals as they, in this case, clearly did not match, leading to cultural misrepresentation. As was the case for Belgium, predictions should become more specific as they currently often miss a lot of nuance.

Results

Images of two different meals from Myanmar were available:

  • Meal 1: Rice with Fish Curry (lunch)
  • Meal 2: Rice with Chicken Chili Curry (lunch)
Object detection results*:
Ground Truth Microsoft Azure Google Vision Amazon Rekognition IBM Watson
Bowl of Rice Tableware(0.772) Food (0.69) Undetected /
Fish Curry / / / /

*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant

Labeling results:
MICROSOFT AZURE GOOGLE VISION AMAZON REKOGNITION IBM WATSON
Food (0.99) Food (0.98) Plant (0.98) Nutrition (0.89)
Plate (0.98) White rice (0.94) Food (0.89) Food (0.89)
Jasmine rice (0.92) Tableware (0.90) Produce (0.89) Dish (0.89)
Indoor (0.91) Jasmine Rice (0.90) Vegetable (0.89) Food product (0.80)
White rice (0.90) Rice (0.88) Dish (0.83) Tableware (0.79)
Steamed Rice (0.88) Staple Food (0.88) Dish (0.83) Porcupine ball (0.72)
Rice (0.82) Ingredient (0.88) Meal (0.83) Fried rice (0.66)
Rice and curry (0.78) Recipe (0.87) Lentil (0.72) Fried Calamari (0.50)
Arborio rice (0.65) Glutinous Rice (0.87) Bean (0.72)
Sri Lankan Cuisine (0.63) Basmati (0.86) Sweets (0.58)
Japanese curry (0.56) Cuisine (0.79) Confectionery (0.58)
Takikomi Gohan (0.55) Steamed Rice (0.78) Breakfast (0.57)
Spiced rice (0.52) Produce (0.78)
Dish (0.76)
Arborio Rice (0.76)
Xôi (0.74)
Comfort Food (0.70)
Chana Masala (0.70)
vegetable (0.70)
Meat (0.68)
Rice and Curry (0.66)
Stew (0.66)
Ghungi (0.61)
Indian Cuisine (0.57)
Koresh (0.55)

Object detection results:

Ground Truth Microsoft Azure Google Vision Amazon Rekognition IBM Watson
Bowl of Rice Undetected Bowl (0.67) Undetected /
Chicken Chili Cury with Mango Salad Undetected Food (0.68)) Undetected /
Garlic Undetected Undetected Undetected /
Spoon Undetected Undetected Undetected /

Labeling results:

MICROSOFT AZURE GOOGLE VISION AMAZON REKOGNITION IBM WATSON
Food (0.99) Food (0.98) Plant (0.99) Pale yellow color (0.77)
Table (0.98) Tableware (0.92) Vegetable (0.92) Utensil (0.71)
Plate (0.97) Staple food (0.88) Food (0.96) Spoon (0.70)
Bowl (0.77) Ingredient (0.88) Produce (0.96) Emerald color (0.68)
Fast food (0.71) Recipe (0.88) Sprout (0.80) Ladle (0.61)
Dish (0.51) Cuisine (0.86) Bean Sprout (0.61) Food product (0.60)
Mixture (0.83) Grain (0.60) Food (0.60)
Dish (0.80) Lentil (0.59) Scoop (0.60)
Produce (0.77) Bean (0.59 Tableware (0.59)
Rice (0.70) Rice (0.59) Tablespoon (0.57)
Comfort Food (0.67) Meal (0.58)
Superfood (0.67)
Bowl (0.65)
Spoon (0.64)
Cutlery (0.64)
Stuffing (0.64)
Kitchen Utsensil (0.63)
Meat (0.61)
Cooking (0.58)
Food Additive (0.58)
Break cereal (0.57)
Fast Food (0.56)
Vegetable (0.55)
Chinese Food (0.55)
Thai Food (0.55)

Object detection results:

Ground Truth Microsoft Azure Google Vision Amazon Rekognition IBM Watson
Bowl of Rice Popcorn (0.78) Packaged goods (0.69) Undetected /
Chicken Chili Cury with Mango Salad Undetected Packaged goods (0.90) Pineapple /

Labeling results:

MICROSOFT AZURE GOOGLE VISION AMAZON REKOGNITION IBM WATSON
Table (0.99) Food (0.97) Food (0.94) Alabaster color (1)
Food (0.99) Ingredient (0.90) Rice (0.84) Shellfish (0.55)
Plate (0.93) Cuisine (0.86) Produce (0.61) Invertebrate (0.55)
Recipe (0.86) Pineapple (0.61) Animal (0.55)
Dish (0.85) Fruit (0.61) Seasnail (0.55)
Staple food (0.85) Gastropod (0.55)
Tableware (0.73) Common limpet (0.54)
Chemical Compound (0.69) Succulent (0.53)
Vegetable (0.67) Plant (0.53)
Comfort Food (0.65) Feather ball (0.53)
Plant (0.62)
Produce (0.59)
Oven Bag (0.58)
Fashion Accessory (0.58)
Jasmine Rice (0.57)
Rice (0.54)
Dairy (0.52)

Of course there’s Beer: Belgium

Image recognition (IR) systems often perform poorly once in the real world. In this post, I test four of the most popular IR systems on original real world images of food from around the world, this time from Belgium.

Key takeaway

Overall, the systems performed poorly, though Amazon Rekognition stood out by providing two meals with almost all necessary labels. Nonetheless, none of the meals were fully detected nor labeled. IBM Watson appeared to be the most specific, but this unfortunately mostly with the wrong labels.

Correctly predicted images 0/3
Correctly detected items 5/35
Correct labels 6/61
Harmful detections/labels
1
The above table includes only detections and labels of 80%+ confidence level, for lower confidence levels see the tables further below.

Insights

Across all systems, kitchen utensils such as forks, knifes and cups appeared to be more easily recognized than the food itself. In terms of the food, the systems mostly used general terms (e.g. food, bottle, cup, etc.) and failed to provide specifics (e.g. fries, beer bottle, cup of coffee, etc.). As such, it appears that the systems were not prepared for the visual complexities meals inherently present.

While food recognition systems present no immediate harm, (cultural) misrepresentations were common (e.g. labeling vegetables as custard or creme brulee [sic]). IBM Watson was most guilty of misrepresentation, but at the same time was also the only system to provide more specific labels. In this sense, the IR systems need to be much more specific in order to be useful, but developers should be careful as this opens also up the space for harm through misrepresentation.

While the labeling features of these systems had some merit to them, the object detection feature of these systems often disappointed. They commonly failed to detect the meals at all and were too general in their description if they did.

Finally, some forks and knifes remained largely undetected though they were clearly visible and recognizable to the human eye. Though not explicitly tested, unfamiliar lighting conditions in the images may have had an impact on this.

My recommendation

Developers of all four systems need to significantly increase system performance. For Azure, Vision and Rekognition this means providing more specific labels, while for Watson this means getting the specific labels right. A lot of (cultural) nuance is currently lost in these systems.

Results

Images of three different meals from Belgium were available:

  • Meal 1: Fries and peanut sauce with a Vedett Beer (lunch)
  • Meal 2: Oatmeal, coffee, oranges, and mixed fruit (breakfast)
  • Meal 3: Baked tofu, white beans, lettuce, sliced tomato, grated carrots (lunch)

Object detection results*:
Ground Truth Microsoft Azure Google Vision Amazon Rekognition IBM Watson
Beer bottle Bottle (0.772) Packaged goods (88%) Undetected /
Fries with peanut sauce Food (0.61) Food (51%) Ice cream /

*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant

Labeling results:
MICROSOFT AZURE GOOGLE VISION AMAZON REKOGNITION IBM WATSON
Food (0.99) Food (0.98) Beer (0.88) Nutrition (0.85)
Fast food (0.98) Bottle (0.93) Alcohol (0.88) Food (0.85)
Indoor (0.95) Tableware (0.91) Bottle (0.93) Food product (0.79)
Drink (0.80) Ingredient (0.88) Drink (0.88) Meal (0.77)
Bottle (0.76) Staple food (0.87) Food (0.84) Chocolate color (0.66)
Snack (0.66) Recipe (0.86) Dish (0.83) Waffles (0.65)
Meal (0.83)
Fries (0.82)

Object detection results:

Ground Truth Microsoft Azure Google Vision Amazon Rekognition IBM Watson
Cup of coffee Cup Coffee cup Undetected /
Oranges Undetected Tableware Undetected /
Oatmeal Bowl Tableware Undetected /
Mixed fruit Bowl Food Undetected /
Spoon Spoon Undetected Spoon /
Spoon Kitchen Utensil Tableware Undetected /

Labeling results:

MICROSOFT AZURE GOOGLE VISION AMAZON REKOGNITION IBM WATSON
Indoor (0.98) Food (0.98) Spoon (0.99) Pale yellow color (0.94)
Food (0.97) Tableware (0.97) Cutlery (0.99) Food (0.86)
Bowl (0.80) Dishware (0.93) Breakfast (0.92) Nutrition (0.81)
Snack (0.69) Ingredient (0.91) Food (0.92) Beige color (0.70)
Breakfast (0.58) Mixing bowl (0.89) Bowl (0.89) Dish (0.67)
Mixing bowl (0.54) Drinkware (0.89) Coffee cup (0.85) Dessert(0.63)
Serveware (0.87) Cup (0.85) Donuts (0.63)
Cuisine (0.85) Oatmeal (0.57) Fried Calamari (0.56)
Cup (0.85) Food (0.98)

Object detection results:

Ground Truth Microsoft Azure Google Vision Amazon Rekognition IBM Watson
Carrots and tomatoes Food Food Undetected /
Beans, lettuce and tofu Undetected Food Undetected /
Fork Undetected Undetected Fork /
Knife Undetected Undetected Undetected /

Labeling results:

MICROSOFT AZURE GOOGLE VISION AMAZON REKOGNITION IBM WATSON
Plate (0.99) Food (0.98) Plant (0.99) Nutrition (0.80)
Table (0.99) Tableware (0.95) Produce (0.95) Food (0.80)
Food (0.97) Dishware (0.92) Food (0.95) Dish (0.80)
Indoor (0.88) Ingredient (0.90) Vegetable (0.90) Beef Tartare (0.62)
Dessert (0.82) Recipe (0.88) Lentil (0.74) Food product (0.60)
Recipe (0.80) Liquid (0.84) Pottery (0.62) Creme brulee (0.55) [sic]
Delicious (0.79) Cuisine (0.83) Vegetation (0.60) Custard (0.55)
Chocolate (0.69) Kitchen Utensil (0.81) Dish (0.60) Risotto (0.50)