I’ve tested the IR systems on five new countries: the Philippines, Canada, the US, Yemen, and Germany (6 meals and 13 images). The results were poor across all countries. If correct at all, detection and labeling descriptions overall remained too general (e.g. Food, Dish, etc.).
Results in numbers:
Correctly predicted images: 0/13 (0%)
Correctly detected items 1/45 (2%)
Correct labels: 14/258 (5%)
Potentially harmful detections/labels: 44
Insights
Watson, Rekognition and to a lesser degree Azure seem to be trigger happy to label food items (e.g. Rice and Kaydos) as Ice Cream. Is this due to a (Western) bias towards ice cream? Too early to tell really, but nonetheless an interesting direction to explore in future updates.
Though Rice was labeled as Ice Cream multiple times, at other times Rice seemed to be one of the most easily detected items, especially for Azure and Vision. So this leads us to wonder where the threshold for Rice and Ice Cream is located.
Unfortunately, potentially harmful descriptions were common, especially in the form of misplaced references to the origin of the meals as well as confusion between types of meats. More blatant forms of cultural misrepresentation were found too. For example Azure assigned 18 different sausage types to an image of spring rolls.
Finally, with labels such as Gluten and Sugar, one has to wonder what we can realistically expect from IR systems. How can these systems possibly know if a Bagel is gluten or sugar free without any context? Even most humans would find this task impossible.
Suggested improvements:
Provide more specific and relevant labels for:
BBQ, [Cup of] BBQ sauce, Kaydos, sliced melon, crab cake, spring rolls, bagel, minestrone, bottle of wine, olives and pesto
Fix (cultural) misrepresentations:
Rice, Kaydos and Crab Cakes are not ice cream, sliced melon is not a banana, spring rolls are not sausages, rice is not oatmeal;
A Thali (Indian serving plate) in an image possibly skews Yemeni food results towards Indian food results.
Understand the limits of IR systems and think about the consequences of these limits:
Can we expect IR systems to distinguish between similar dishes of different countries without further context or input?
Can we expect IR systems to detect if, for example, a meal is gluten or sugar free simply based on an image without further context or input?
Update: 17/06
So far, I’ve tested the systems on food from Belgium, Myanmar, Vietnam and Malaysia (9 meals and 11 images). With these admittedly limited samples, I wanted to write an interim conclusion to make the results of the project easily digestible.
Thus far, the results have been disappointing to very disappointing. Object detection consistently failed across the four systems. The systems labeled the images better, though these labels also often included too general or irrelevant descriptions.
In some cases, these descriptions culturally misrepresented (e.g. mistake the country or culture of the dish) the food, which could be controversial in some contexts. In one instance, “Chicken” was described as “Beef”, which in some contexts could disadvantage people from certain religions. In another instance, “Mock Meat” was labeled as “Meat”, which could similarly disadvantage people from certain religions as well as vegetarians/vegans.
Note: all numbers on this page are only from detections and labels of 80%+ confidence level, for lower confidence levels see the countries’ individual analyses.
http://nielsquinten.com/wp-content/uploads/2021/01/EmptyLogoWebsite.png00Nielshttp://nielsquinten.com/wp-content/uploads/2021/01/EmptyLogoWebsite.pngNiels2021-06-29 14:57:202021-07-07 12:27:09Suggestions to Improve Image Recognition Systems to Recognize Food from Around the World? (Report)
Overall, the systems performed very poorly. Not a single item was correctly detected and severe cultural misrepresentations were made. Vision was able to pick up on the visual characteristics of the BBQ, but unfortunately ascribed it to many different cultures and countries except for the Philippines.
Correctly predicted images
0/3
Correctly detected items
0/9
Correct labels
4/104
Potentially harmful detections/labels
35
The above table includes only detections and labels of 80%+ confidence level, for lower confidence levels see the tables further below.
Insights
Object Detection
The object detection features failed to identify a single item across all three images. Descriptions that were given, were very general (e.g. “Food”). Also, Rekognition identified rice as “Ice Cream” across all three images, while Vision also identified the rice as “Ice Cream” and “Dessert”. Many of the items remained undetected.
Labeling
The BBQ is perhaps one of the most visually distinctive item in the three images. Vision clearly recognized this distinctiveness as it consistently labeled skewered BBQ like dishes such as Sate Kambing (Indonesian mutton satay), Shashlik (Caucasus/Central Asian skewered meat), Yakatori (Japanese skewered pork or chicken), and many more.
Unfortunately, none of these labels referred to the Philippines, which makes us question if any cultural misrepresentation is at play here (BBQ is a very common dish in the Philippines). It also makes us question as to in how far it is possible to distinguish visually very similar dishes between cultures based on an image without context.
While in previous analyses rice seemed somewhat easy to detect, Watson and Rekognition mistook rice as “Ice Cream” across multiple images. Vision also made this error (once as “Ice Cream” and another time as “Dessert”), though it did also label it as “Rice”. Here, again, it makes us wonder if any cultural misrepresentation is at play. Though one can imagine the visual similarities between the round shaped presentation of the rice (which is common in many Asian countries) in the images and ice cream, one can not help but feel ice cream was simply more represented in the training data.
As always, many labels were also too general or irrelevant.
My recommendation
Provide more specific and relevant labels for “BBQ” and “[Cup of] BBQ Sauce”;
Fix (cultural) misrepresentations (i.e. rice is not ice cream);
Understand the limits of how well a IR system could distinguish between similar dishes of different countries without further context or input, and what consequences this limit could have.
Results
Three images of one meal from the Philippines were available:
Meal 1: White rice and BBQ (dinner)
Object detection results.
GROUND TRUTH
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOG.
IBM WATSON
White Rice
Undetected
Dessert (0.57)
Ice Cream (0.96)
/
BBQ
Undetected
Tableware (0.51)
Undetected
/
Cup of BBQ Sauce
Bowl (0.61)
Tableware (0.88)
Undetected
/
*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant
Labeling Results:
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOG.
IBM WATSON
food (0.98)
Food (0.98)
Ice Cream (0.96)
Ice Cream or Frozen Yoghurt (0.9)
person (0.94)
Tableware (0.96)
Dessert (0.96)
dessert (0.9)
cuisine (0.93)
Ingredient (0.91)
Cream (0.96)
nutrition (0.9)
snack (0.91)
Suya (0.9)
Creme (0.96)
food (0.9)
fast food (0.9)
Dishware (0.89)
Food (0.96)
Ice Cream Parlor (0.8)
dish (0.89)
Plate (0.88)
Meal (0.92)
shop (0.8)
dairy (0.85)
Shashlik (0.88)
Person (0.88)
retail store (0.8)
indoor (0.82)
Recipe (0.87)
Human (0.88)
building (0.8)
table (0.78)
Anticuchos (0.86)
Bowl (0.85)
chocolate color (0.61)
Cuisine (0.85)
Dish (0.8)
dark red color (0.57)
Dish (0.84)
Dish (0.8)
Brochette (0.84)
Sate kambing (0.83)
Fried food (0.75)
Beef (0.72)
Meat (0.72)
Cooking (0.72)
Produce (0.72)
Churrasco food (0.71)
Bowl (0.7)
Platter (0.69)
Fork (0.69)
Buffalo wing (0.67)
Finger food (0.66)
Comfort food (0.65)
Meal 1 (picture 2): White rice and BBQ (dinner)
Object detection results.
GROUND TRUTH
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOG.
IBM WATSON
Rice
Undetected
Ice Cream (0.69)
Ice Cream (0.94)
/
BBQ
Undetected
Food (0.71)
Undetected
/
Cup of BBQ Sauce
Bowl (0.53)
Tableware (0.84)
Undetected
/
*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant
Labeling Results:
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOG.
IBM WATSON
food (0.99)
Food (0.98)
Ice Cream (0.94)
Ice Cream or Frozen Yogurt (0.89)
cuisine (0.92)
Tableware (0.95)
Dessert (0.94)
dessert (0.89)
dairy (0.91)
White rice (0.93)
Cream (0.94)
nutrition (0.89)
ice cream (0.89)
Ingredient (0.9)
Creme (0.94)
food (0.89)
person (0.62)
Recipe (0.88)
Food (0.94)
alizarine red color (0.84)
Plate (0.87)
Meal (0.89)
Ice Cream Parlor (0.62)
Jasmine rice (0.87)
Plant (0.77)
shop (0.62)
Staple food (0.85)
Dish (0.73)
retail store (0.62)
Sate kambing (0.84)
Outdoors (0.59)
building (0.62)
Shashlik (0.84)
Rice (0.84)
Brochette (0.83)
Glutinous rice (0.82)
Anticuchos (0.81)
Fork (0.81)
Suya (0.8)
Dish (0.78)
Nasi lemak (0.77)
Steamed rice (0.76)
Produce (0.76)
Basmati (0.75)
Cuisine (0.73)
Meat (0.72)
Fried food (0.71)
Comfort food (0.7)
Meal 1 (picture 3): White rice and BBQ (dinner)
Object Detection Results:
GROUND TRUTH
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOG.
IBM WATSON
Rice
Undetected
Food (0.6)
Ice Cream (0.81)
/
BBQ
Undetected
Undetected
Undetected
/
Cup of BBQ Sauce
Bowl (0.66)
Bowl (0.88)
Undetected
/
Labeling Results:
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOG.
IBM WATSON
snack (0.9)
Food (0.99)
Person (0.98)
building (0.82)
dairy (0.89)
Brochette (0.95)
Human (0.98)
food (0.8)
fast food (0.88)
Suya (0.94)
Ice Cream (0.81)
shop (0.73)
person (0.84)
Tableware (0.94)
Dessert (0.81)
retail store (0.73)
food (0.82)
Ingredient (0.91)
Cream (0.81)
chestnut color (0.73)
indoor (0.61)
Anticuchos (0.91)
Creme (0.81)
deli (0.64)
Recipe (0.88)
Food (0.81)
restaurant (0.55)
Shish taouk (0.88)
Meal (0.65)
bakery (0.5)
White rice (0.87)
Shashlik (0.87)
Sate kambing (0.85)
Pincho (0.84)
Satay (0.84)
Dish (0.83)
Arrosticini (0.83)
Rice (0.82)
Cuisine (0.82)
Souvlaki (0.81)
Plate (0.81)
Churrasco food (0.8)
Cooking (0.79)
Jasmine rice (0.79)
Yakitori (0.77)
Kebab (0.77)
Produce (0.75)
https://nielsquinten.com/wp-content/uploads/2021/06/image-5.png877655Nielshttp://nielsquinten.com/wp-content/uploads/2021/01/EmptyLogoWebsite.pngNiels2021-06-24 16:25:202021-06-29 14:47:22Rice is not Ice Cream: the Philippines
Overall, the systems performed very poorly. No images were correctly described, nor were any items in the images correctly detected. Some labels such as “Rice” and “White Rice” described an item in the images, but in general the labels remained superficial. Many potentially harmful labels were found.
Correctly predicted images
0/5
Correctly detected items
0/10
Correct labels
5/89
Potentially harmful detections/labels
9
The above table includes only detections and labels of 80%+ confidence level, for lower confidence levels see the tables further below.
Insights
Object Detection
Across the two meals, the object detection systems performed very disappointingly. Most items were described as the general terms of “Food” or “Bowl”, while two descriptions were completely off: crab cakes as well as Kadyos were detected as “Ice Cream” by Rekognition, while sliced melon was recognized as “Banana”. Though difficult to verify, these mistakes might indicate a (cultural) misrepresentation in the training data (i.e. more images of bananas than melons).
Labeling
Labeling remained surface level as well. General labels such as “Food”, “Dish” and “Plate” were common, as well as irrelevant labels such as “Ingredient” and “Recipe”. As is, the value of these labels is questionable, though this of course depends on the applications of use.
More specific (though still somewhat general) labels were also found, including “Stew”, “Brown Sauce”, “Comfort Food”. As in previous cases, descriptions such as “Rice” and “White Rice” were plentiful as one image clearly depicted rice.
Unfortunately, the systems also gave many potentially harmful or simply wrong descriptions. In a first instance, in one instance, one meal was described as “German Food”. In other instance, the meals’ were typical meals from other origins (e.g. “Semur” [an Indonesian Dish] , “Dumpling”, “Varenyky”) than Canadian/Filipino.
Azure seemed to be really confused by the image of three spring rolls, giving it 18 different descriptions of sausages (e.g. “Bratwurst”, “Loukaniko”, etc.) and not one of “Spring Rolls”. Rekognition also described the spring rolls as “Hot Dog” with 0.99 certainty.
As in the analyses of previous countries, types of meats were also often used interchangeably, as in “Pork”, “Beef”, “Chicken meat”, “Clam” would all be given as a label. Depending on the context, this could be detrimental to people of certain religions or with certain diets that avoid certain meats.
My recommendation
Provide more specific and relevant labels for “Kaydos”, “Sliced Melon”, “Crab cake” and “Spring Rolls”;
Fix (cultural) misrepresentations (i.e. crab cakes are not ice cream, sliced melon is not a banana, spring rolls are not sausages);
Make sure labels of meat do not harm people of certain religions or with certain diets.
Results
Five images of two different meals from Canada (with a Filipino origin) were available:
https://nielsquinten.com/wp-content/uploads/2021/06/image-5.png877655Nielshttp://nielsquinten.com/wp-content/uploads/2021/01/EmptyLogoWebsite.pngNiels2021-06-23 15:59:222021-07-01 12:37:37All Sausage, but no Spring Roll: Canada
Overall, the systems’ performances were disappointing. “Bread” appeared somewhat easy to detect, though “Soup” was not. The labeling was a bit more generous and could also detect the soup, though specific description remained illusive. Finally, one wonders if labels such as “Gluten” and “Sugar” can be determined simply by a picture of a meal, and what the consequences could be of including these labels.
Correctly predicted images
0/1
Correctly detected items
0/3
Correct labels
1/19
Potentially harmful detections/labels
0
The above table includes only detections and labels of 80%+ confidence level, for lower confidence levels see the tables further below.
Insights
The object detection features performed poorly on the selected image, though Azure and Rekognition did detect “Baked Goods” and “Bread”. Though the bagel was not specifically in itself, the image’s perspective perhaps would have many humans say “bread” instead of “Bagle” as well. However, Azure and Vision detected the soup simply as “Food” (undetected by Rekognition and Watson), which is a very general description for a somewhat typical part of a Western meal.
The labeling features again performed better. “Soup” was labeled, as well as “Bread”, “Baked goods”, “Bun”, “Seed”. Unfortunately, many labels were also too general (e.g. “Food”, “Meal, “Dish”, etc.), irrelevant (e.g. “Recipe”, “Nutrition”, etc.), wrong (e.g. “Cake”, “Chocolate”, “Mole”, etc.), or a cultural misrepresentation (e.g. “Curry”).
Lastly, labels such as “Gluten” and “Sugar” are perhaps not wrong, but are hard to discern from an image (i.e. there are also gluten and sugar free bagels). As these could have a strong impact on people with certain diets, this leads to wonder if such labels should be present in IR systems at all.
My recommendation
As always, the developers should implement more specific and relevant labels. Bread seems to be recognized well, so the developers can be proud of that. Developers should also look into labels such as “Gluten” and “Sugar” and if these can actually be recognized from a picture of a meal.
Results
An image of one meal from the US was available:
Meal 1: New York “Everything” Bagel and Tomato Soup (Lunch)
Meal 1: New York “Everything” Bagel and Tomato Soup (Lunch)
Object detection results*:
Ground Truth
Microsoft Azure
Google Vision
Amazon Rekognition
IBM Watson
Cup of Soup
Food (0.53)
Food (0.73)
Undetected
/
Spoon
Undetected
Undetected
Undetected
/
"Everything" Bagel
Baked Goods (0.77)
Food (0.63)
Bread (0.99)
/
*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant
Labeling results:
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOGNITION
IBM WATSON
Baked Goods (0.99)
Food (0.99)
Bread (0.99)
Light Brown Color (0.85)
Food (0.97)
Ingredient (0.91)
Food (0.99)
Food (0.71)
Dessert (0.93)
Tableware (0.88)
Bun (0.88)
Reddish Orange Color (0.76)
Bread (0.93)
Recipe (0.87)
Bowl (0.64)
Nutrition (0.58)
Chocolate (0.83)
Cuisine (0.86)
Food Product (0.56)
Recipe (0.82)
Dish (0.85)
Sauce (0.56)
Cake (0.79)
Seed (0.81)
Condiment (0.56)
Delicious (0.78)
Staple Food (0.80)
Food Seasoning (0.56)
Fast Food (0.75)
Produce (0.80)
Food Ingredient (0.56)
Ingredient (0.60)
Soup (0.79)
Mole (0.54)
Gluten (0.60)
Bun (0.78)
Staple Food (0.54)
Gravy (0.78)
Gravy (0.52)
Cake (0.76)
Dish (0.51)
Gluten (0.76)
Bread (0.74)
Stew (0.73)
Curry (0.72)
Baked Goods (0.71)
Sugar (0.69)
Bowl (0.69)
Bread Roll (0.68)
Finger Food (0.67)
Baking (0.67)
Brown Bread (0.65)
Comfort Food (0.65)
http://nielsquinten.com/wp-content/uploads/2021/01/EmptyLogoWebsite.png00Nielshttp://nielsquinten.com/wp-content/uploads/2021/01/EmptyLogoWebsite.pngNiels2021-06-21 16:34:002021-06-29 14:49:02To Gluten or not to Gluten.. The US
Overall, the systems’ performances were disappointing. In one image, “Rice” was labeled by multiple systems (though not detected), but not in the other where the rice was much more visible. Labels were often too general or irrelevant. Some labels culturally misrepresented the Yemeni food and utensils as Indian or Western. One mislabeling instance was found that could disadvantage people with certain diets or religions.
Correctly predicted images
0/2
Correctly detected items
0/8
Correct labels
3/35
Potentially harmful detections/labels
0
The above table includes only detections and labels of 80%+ confidence level, for lower confidence levels see the tables further below.
Insights
The object detection feature failed to provide specific descriptions of the objects across all four systems. The descriptions that were given remained surface level (e.g. “Bowl” instead of “Bowl of Chicken”, or “Bowl” instead of “Cup of Vegetable Sauce”) and many objects simply remained undetected.
Interestingly, the labeling features of Azure and Vision gave Indian-origin labels (e.g. “Masala”) to the first image. One explanation for this could be the plate on which the meal is served, which is also commonly used in Indian cuisine (a “Thali”). Perhaps the prevalence of Indian meals in the training images (i.e. because of a higher population, more common use of English, etc.) could contribute to this misrepresentation of a Yemen meal as Indian food.
Several cultural misrepresentations were present as well. For instance, Rekognition labeled (presumably) the bowl of rice as “Oatmeal” and IBM Watson labeled the food as an “Irish Stew”. These misrepresentations should make us question if Yemeni food was represented enough in the training images.
The systems provided more correct labels on the second image (e.g. “Chicken”, “Rice”, and “Carrot”). This is an interesting outcome because, in the first image, all the items are separated in different bowls. One could assume separate items helps the systems to distinguish between the items, but this was not the case. It will be interesting to see if this happens on pictures of other countries as well, as serving food as separate items is common in many Asian countries while and less so in Western countries.
Finally, in one instance, Vision labeled the food as “Seafood”, which could disadvantage people with certain diets (e.g. pescatarian) or religions.
My recommendation
As stated in the analyses of previous countries, the object detection features need to become better in detecting all the objects as well as giving more specific descriptions. The latter is similarly true for the labeling features. Also, further attention is needed towards the idea that the recognition of Yemeni food is influenced by Indian training images. Developers should also be careful that the presentation of Yemeni food and Asian food in general (i.e. as separate items) does not impact their system’s performance. Vision appears to be very good in detecting “Carrot” (see also Vietnam), so congratulations to the developers for that.
Results
Two images of one meal from Yemen were available:
Meal 1: Rice, Cooked Chicken, Raw Vegetables, Vegetable Sauce (Lunch)
Object detection results*:
Ground Truth
Microsoft Azure
Google Vision
Amazon Rekognition
IBM Watson
Bowl of Rice
Undetected
Bowl (0.87)
Undetected
/
Bowl of Chicken
Bowl (0.58)
Food (0.78)
Undetected
/
Cup of Vegetable Sauce
Undetected
Bowl (0.80)
Undetected
/
Plate
Tableware (0.63)
*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant
Labeling results:
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOGNITION
IBM WATSON
Plate (0.99)
Food (0.98)
Bowl (0.97)
Chestnut color (0.68)
Table (0.98)
Tableware (0.97)
Breakfast (0.91)
Food (0.65)
Food (0.98)
Dishware (0.88)
Food (0.91)
Orange Color (0.62)
Indoor (0.86)
Ingredient (0.88)
Produce (0.76)
Beverage (0.60)
Mixture (0.72)
Recipe (0.88)
Meal (0.71)
Nutrition (0.59)
Bowl (0.70)
Cuisine (0.84)
Dish (0.70)
Dish (0.58)
Masala (0.66)
Dish (0.81)
Plant (0.67)
Bowl (0.55)
Spoon (0.54)
Staple Food (0.81)
Oatmeal (0.60)
Tableware (0.55)
Tableware (0.50)
Bowl (0.79)
Utensil (0.55)
Mixture (0.77)
Slop Bowl (0.52)
Produce (0.75)
Serveware (0.70)
Comfort Food (0.69)
Masala (0.69)
Spoon (0.68)
Kitchen Utensil (0.68)
Metal (0.67)
Rice (0.67)
Mixing Bowl (0.66)
Breakfast (0.62)
Gravy (0.61)
South Indian Cuisine (0.61)
Tandoori Masala (0.59)
Side Dish (0.59)
Plate (0.59)
Object detection results:
Ground Truth
Microsoft Azure
Google Vision
Amazon Rekognition
IBM Watson
Rice
Undetected
Food (0.76)
Undetected
/
Chicken
Undetected
/
Undetected
/
Raw Vegetables
Food (0.52)
Carrot (0.66)
Undetected
/
Labeling results:
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOGNITION
IBM WATSON
Food (0.90)
Food (0.98)
Plant (0.99)
Reddish Orange Color (0.97)
Plate (0.88)
Tableware (0.93)
Dish (0.96)
Dish (0.89)
Chicken (0.72)
Ingredient (0.90)
Meal (0.96)
Nutrition (0.89)
Dinner (0.65)
Recipe (0.87)
Food (0.96)
Food (0.89)
White Rice (0.60)
White Rice (0.85)
Vegetable (0.88)
Stew (0.85)
Steamed Rice (0.59)
Rice (0.82)
Rice (0.84)
Food Product (0.80)
Recipe (0.55)
Staple Food (0.78)
Bowl (0.63)
Tableware (0.79)
Glutinous Rice (0.52)
Cuisine (0.78)
Produce (0.56)
Bouilabaisse (0.71)
Dish (0.76)
Curry (0.56)
Irish Stew (0.68)
Produce (0.76)
Curry (0.50)
Meat (0.75)
Vegetable (0.75)
Seafood (0.75)
Stew (0.73)
Jasmine Rice (0.70)
Fast Food (0.69)
Plate (0.68)
Comfort Food (0.68)
Carrot (0.65)
Baby Carrot (0.64)
Gosht (0.64)
Mixture (0.63)
Brassicales (0.63)
Take-out Food (0.63)
Leaf Vegetable (0.62)
http://nielsquinten.com/wp-content/uploads/2021/01/EmptyLogoWebsite.png00Nielshttp://nielsquinten.com/wp-content/uploads/2021/01/EmptyLogoWebsite.pngNiels2021-06-19 17:53:412021-06-29 14:53:18This is not India: Yemen
Overall, the systems’ performances were disappointing. On the positive side, Amazon Rekognition did correctly label “Alcohol”, which is great because of alcohol’s sensitive nature in some contexts. Other than that, the systems failed to detect many objects, failed to provide specific labels, and presented many irrelevant labels.
Correctly predicted images
0/2
Correctly detected items
1/15
Correct labels
1/38
Potentially harmful detections/labels
0
The above table includes only detections and labels of 80%+ confidence level, for lower confidence levels see the tables further below.
Insights
The object detection feature failed to provide specific descriptions of the objects across all four systems. The descriptions that were given remained surface level (e.g. “Food” instead of “Minestrone”, or “Bottle” instead of “Bottle of Wine”) and many objects simply remained undetected. In the case of Vision, the bounding boxes’ positions of the object detection feature were all scrambled and unintelligible (see picture below). It’s unclear why the bounding boxes showed up like this.
The bouding boxes provided by Google Vision. Source: Google Vision API
As was the case for the previous countries, the labeling feature performed slight better, but still unsatisfactory. Amazon Rekognition provided the label “Alcohol” and “Wine” which, given the sensitive nature of alcohol in certain contexts, works well. Unfortunately, the other three systems failed to label these.
Except for “Alcohol” and “Wine”, other labels remained very unspecific and uninformative. For instance, “Food”, “Plate”, “Dish”, “Kitchen Utensil” and such were common. None detected the olives, pesto, or Minestrone. In one case, the Minestrone was labeled as “Curry”, which could be considered a (cultural) misrepresentation.
My recommendation
As for the previous analyses, both the object detection and labeling feature need much improvement. For object detection, this means being able to detect the various items in the first place, and giving more specific description in the second place. The labeling feature of Amazon Rekognition correctly identified “Alcohol” and “Wine” – both sensitive items in certain contexts -, and the other three systems would perhaps do well to also implement the identification of “Alcohol”.
Results
Two images of one meal from Germany were available:
Meal 1: Minestrone with Pesto and Olives, and a glass of Wine (Dinner)
Meal 1 (picture 1): Minestrone with Pesto and Olives, and a glass of Wine (Dinner)
Object detection results*:
Ground Truth
Microsoft Azure
Google Vision
Amazon Rekognition
IBM Watson
Bowl of Minestrone
Bowl (0.66)
Scrambled*
Undetected
/
Bowl of Pesto
Bowl (0.58)
Scrambled*
Undetected
/
Spoon
Undetected
Scrambled*
Undetected
/
Glass of wine
Undetected
Scrambled*
Undetected
/
Cup of Olives
Undetected
Scrambled*
Undetected
/
Wine Bottle
Bottle (0.56)
Scrambled*
Undetected
/
Spoon
Undetected
Scrambled*
Undetected
/
*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant
Labeling results:
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOGNITION
IBM WATSON
Food (0.98)
Food (0.98)
Dish (0.95)
Olive Green Color (0.80)
Plate (0.89)
Tableware (0.97)
Meal (0.95)
Orange Color (0.66)
Ingredient (0.90)
Food (0.95)
Food (0.65)
Dishware (0.89)
Plant (0.73)
Dish (0.65)
Recipe (0.88)
Pottery (0.68)
Nutrition (0.65)
Serveware (0.85)
Alcohol (0.58)
Tableware (0.60)
Kitchen Utensil (0.84)
Beverage (0.58)
Side Dish (0.59)
Cuisine (0.83)
Drink (0.58)
Curry (0.51)
Dish (0.82)
Garnish (0.50)
Rectangle (0.80)
Vegetable (0.78)
Leaf Vegetable (0.78)
Soup (0.77)
Produce (0.74)
Mixture (0.74)
Curry (0.70)
Comfort Food (0.69)
Spoon (0.69)
Yellow Curry (0.68)
Cutlery (0.68)
Circle (0.68)
Garnish (0.68)
Condiment (0.67)
Plate (0.65)
Stew (0.65)
Meal 1 (picture 2): Minestrone with Pesto and Olives, and a glass of Wine (Dinner)
Object detection results:
Ground Truth
Microsoft Azure
Google Vision
Amazon Rekognition
IBM Watson
Bowl of Minestrone
Undetected
Food (0.78)
Undetected
/
Bowl of Minestrone
Bowl (0.75)
Food (0.68)
Undetected
/
Bowl of Pesto
Kitchen Utensil (0.52)
Food (0.55)
Undetected
/
Spoon
Undetected
Undetected
Undetected
/
Glass of wine
Undetected
Undetected
Undetected
/
Glass of wine
Cup (0.58)
Undetected
Undetected
/
Wine Bottle
Bottle (0.84)
Packaged Goods
Beer
/
Spoon
Undetected
Undetected
Spoon
/
Labeling results:
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOGNITION
IBM WATSON
Table (0.99)
Food (0.99)
Dish (0.99)
Nutrition (0.74)
Plate (0.99)
Tableware (0.97)
Meal (0.99)
Food (0.74)
Food (0.99)
Table (0.95)
Food (0.99)
Charcoal Color (0.62)
Indoor (0.98)
Bottle (0.94)
Spoon (0.92)
Dish (0.60)
Wall (0.98)
Dog (0.93)
Cutlery (0.92)
Piece de Resistance Dish (0.59)
Bottle (0.90)
Plate (0.90)
Alcohol (0.74)
Table (0.55)
Drink (0.75)
Dishware (0.90)
Beverage (0.74)
Furniture (0.55)
Tableware (0.65)
Ingredient (0.89)
Drink (0.74)
Dining Table (0.54)
Counter (0.60)
Recipe (0.87)
Person (0.67)
Dinner Table (0.53)
Dish (0.53)
Houseplant (0.85)
Human (0.67)
Plate (0.52)
Leaf vegetable (0.83)
Stew (0.65)
Kitchen Utensil (0.78)
Beer (0.62)
Vegetable (0.78)
Pasta (0.60)
Cooking (0.77)
Curry (0.58)
Cuisine (0.76)
Glass (0.58)
Companion Dog (0.75)
Wine (0.58)
Broccoli (0.75)
Restaurant (0.55)
Produce (0.74)
Serveware (0.74)
Garnish (0.74)
Dish (0.73)
Bowl (0.73)
Comfort Food (0.71)
Fork (0.70)
Culinary Art (0.70)
http://nielsquinten.com/wp-content/uploads/2021/01/EmptyLogoWebsite.png00Nielshttp://nielsquinten.com/wp-content/uploads/2021/01/EmptyLogoWebsite.pngNiels2021-06-18 16:31:372021-06-29 14:53:43A bit more Wine, Please: Germany
For the project Recognizing Food from Around the World, I’m testing images of food from around the world on four of the most popular image recognition systems. During this project, I’m trying to figure out how well these systems (1) work on real life, original images and (2) how well these systems interpret nuances between cultures and countries, and (3) where these systems can make improvements.
So far, I’ve tested the systems on food from Belgium, Myanmar, Vietnam and Malaysia (9 meals and 11 images). With these admittedly limited samples, I wanted to write an interim conclusion to make the results of the project easily digestible.
Thus far, the results have been disappointing to very disappointing. Object detection consistently failed across the four systems. The systems labeled the images better, though these labels also often included too general or irrelevant descriptions.
In some cases, these descriptions culturally misrepresented (e.g. mistake the country or culture of the dish) the food, which could be controversial in some contexts. In one instance, “Chicken” was described as “Beef”, which in some contexts could disadvantage people from certain religions. In another instance, “Mock Meat” was labeled as “Meat”, which could similarly disadvantage people from certain religions as well as vegetarians/vegans.
Please stay tuned for more results!
http://nielsquinten.com/wp-content/uploads/2021/01/EmptyLogoWebsite.png00Nielshttp://nielsquinten.com/wp-content/uploads/2021/01/EmptyLogoWebsite.pngNiels2021-06-17 12:15:202021-06-29 14:44:12Interim Conclusion #1: Image Recognition Systems Disappoint on images of Food
Overall, the systems’ performances were very disappointing. Object detection was not able to accurately detect any part of a meal, while labeling detected rice and a fork, but was too general for other items. One instance of cultural misrepresentation was also found.
Correctly predicted images
0/1
Correctly detected items
0/7
Correct labels
2/19
Potentially harmful detections/labels
0
The above table includes only detections and labels of 80%+ confidence level, for lower confidence levels see the tables further below.
Insights
The object detection features performed very poorly on the selected image. Only the very general terms of “Food” and “Tableware” were used to describe items such as “Bowl of Rice”, “Steamed Egg”, “Spoon” by Vision. Furthermore, “Cucumber Soup” was detected as “Plate” and “Chinese Iced Tea” as “Tableware”. All other systems failed to detect anything.
Vision’s labeling feature performed better, but still unsatisfactory. For instance, it labeled “White Rice”, “Fork”, “Steamed Rice” and “Meat”, but failed to detect the “Chinese Iced Tea”, “Cucumber Soup”, “Steamed Egg”, etc. The other systems performed much worse, with only being able to label in a very general manner (e.g. “Food”, “Plate”, “Tableware”, etc.).
Watson also culturally misrepresented the food and labeled it as “Taco”.
My recommendation
Developers should use more specific labels in order for the system to be useful. As the object detection system did not work well at all, a deeper analysis of what went wrong is needed.
Results
An image of one meal from Malaysia was available:
Meal 1: Rice, Steamed Egg, Sweet and Sour Pork, Cucumber Soup, Chinese Iced Tea (Dinner)
Object detection results*:
Ground Truth
Microsoft Azure
Google Vision
Amazon Rekognition
IBM Watson
Bowl of Rice
Undetected
Food (0.73)
Undetected
/
Steamed Egg
Undetected
Food (0.72)
Undetected
/
Sweet and Sour Pork
Undetected
/
Undetected
/
Spoon
Undetected
Tableware (0.56)
Undetected
/
Cucumber Soup
Undetected
Plate (0.84)
Undetected
/
Chinese Iced Tea
Undetected
Tableware (0.76)
Undetected
/
Fork
Undetected
Undetected
Undetected
/
*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant
Labeling results:
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOGNITION
IBM WATSON
Food (0.99)
Food (0.99)
Plant (0.90)
Nutrition (0.71)
Table (0.96)
Tableware (0.97)
Food (0.90)
Food (0.71)
Plate (0.95)
Ingredient (0.91)
Breakfast (0.87)
Dish (0.71)
Tableware (0.79)
White Rice (0.90)
Bowl (0.83)
Taco (0.70)
Breakfast (0.66)
Recipe (0.88)
Meal (0.81)
Shop (0.68)
Dish (0.61)
Cuisine (0.86)
Vegetation (0.77)
Retail Store (0.68)
Fast food (0.60)
Plate (0.86)
Dish (0.69)
Building (0.68)
Dinner (0.57)
Kitchen Utensil (0.86)
Lunch (0.66)
Reddish Brown Color (0.66)
Bowl (0.52)
Fork (0.85)
Produce (0.65)
Light Brown Color (0.64)
Dish (0.85)
Vegetable (0.57)
Food Product (0.60)
Nasi Kandar (0.81)
Produce (0.79)
Staple Food (0.76)
Jasmine Rice (0.74)
Meat (0.72)
Steamed Rice (0.71)
Serveware (0.69)
Vegetable (0.69)
Comfort Food (0.68)
Rice (0.67)
Fried Food (0.67)
Lime (0.66)
Condiment (0.66)
Hayashi Rice (0.63)
Papadum (0.61)
http://nielsquinten.com/wp-content/uploads/2021/01/EmptyLogoWebsite.png00Nielshttp://nielsquinten.com/wp-content/uploads/2021/01/EmptyLogoWebsite.pngNiels2021-06-17 11:23:382021-06-29 14:54:06There’s more than Rice: Malaysia
Overall, the systems’ performances were disappointing. Object detection failed miserably across all four systems. Labeling worked slightly better, though rice was labeled in the first meal, but generally not in the second two meals. The systems overall missed a lot of nuances and provided predictions that could harm religious people and vegetarians/vegans.
Correctly predicted images
0/4
Correctly detected items
1/16
Correct labels
9/95
Potentially harmful detections/labels
0
The above table includes only detections and labels of 80%+ confidence level, for lower confidence levels see the tables further below.
Insights
The object detection features performed very poorly on the selected images: only one piece of carrot and one spoon was correctly identified (only be Azure and Rekognition, respectively), other items remained largely undetected (especially by Rekognition) or were described in a too general manner.
As usual, the labeling features identified more items, but were also severely lacking in detail. For the first meal, the rice and vegetables were detected, though only Vision detected both. Nuances such “fried vegetables” or “mixed vegetables” were not present. The soup was detected by Vision and Watson, though did not recognize it as “Bok Choy Soup”. Rekognition was able to label a spoon, which other systems missed. The chopsticks were not labeled.
For the second meal, rice was only detected by Vision and not by other systems, which is surprising as rice seemed easily detectable in the previous meal and other cases. Perhaps this was due to the rather unusual and cut-off position the rice was in, though this is just a guess. Is is common, Vision provided more labels, but this led to clear (cultural) misrepresentations (e.g. labels of “Guk” and “Sauerkraut”). One misrepresentation of Vision could, in some contexts, also cause harm to religious people or vegetarians/vegans: mock meat was labeled as meat with a moderately high confidence score (0.73).
Finally, for the third meal, only “Tomato”, “Iceburg Lettuce” [sic], and “Grilling were identified, the chicken and rice were not. As with the mock meat, the chicken was actually labeled as both “Beef” and “Steak”, which could cause harm or confusion to religious people. Also, as was the case in an image of the first meal, the chopsticks were not labeled. It will be interesting to see (in future analysis of other countries) if this is due to cultural bias or if this is simply a coincidence. Rice was strangely not detected in the image though it is clearly visibele, perhaps this is due to it’s position in the image.
My recommendation
As in previous analysis, I recommend developers use more specific labels and make sure that the specific labels they use don’t cause any misrepresentation. In this case, mock meat and types of real meats were interchanged. Unfortunately, this could harm religious people and vegetarians/vegans. Developers should also be sure to check if their systems work properly on chopsticks and not just spoons and forks, though further analysis is needed for this. Finally, Vision developers should probably fix the typo in the “Iceburg Lettuce” label.
Results
Images of three different meals from Vietnam were available:
Meal 1: Rice, Bok Choy Soup and Fried Mixed Vegetables (Dinner)
Meal 2: Rice, Cucumber Soup, Mock Meat and Lettuce (Lunch)
Meal 3: Rice, Grilled Chicken, and raw vegetables (Dinner)
Meal1 (picture 1): Rice, Bok Choy soup and fried mixed vegetables (Dinner)
Object detection results*:
Ground Truth
Microsoft Azure
Google Vision
Amazon Rekognition
IBM Watson
Bowl of Rice
Undetected
Food (0.77)
Undetected
/
Fried Mixed Vegetables
Carrot (0.66)
Food (0.72)
Undetected
/
*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant
Labeling results:
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOGNITION
IBM WATSON
Food (0.99)
Food (0.99)
Plant (0.99)
Olive Green Color (0.96)
Rice (0.98)
Tableware (0.96)
Vegetables (0.96)
Grain (0.94)
Jasmine Rice (0.75)
White Rice (0.92)
Food (0.96)
Food Product (0.94)
White Rice (0.73)
Ingredient (0.91)
Produce (0.92)
Food (0.94)
Steamed Rice (0.70)
Staple Food (0.88)
Bowl (0.57)
Rice (0.83)
Glutinous Rice (0.66)
Recipe (0.88)
Cutlery (0.57)
White Rice (0.80)
Cooking (0.60)
Jasmine Rice (0.87)
Wilde Rice (0.62)
Ingredient (0.58)
Cuisine (0.87)
Cuisine (0.56)
Rice (0.86)
Cheese (0.55)
Dish (0.86)
Homemade (0.52)
Produce (0.79)
Vegetable (0.77)
Steamed Rice (0.76)
Garnish (0.75)
Bowl (0.75)
Plate (0.75)
Lead Vegetable (0.73)
Salad (0.69)
Basmati (0.69)
Glutinous Rice (0.67)
Prepackaged Meal (0.67)
Carrot (0.66)
Brassicales (0.65)
Meal (0.62)
Supper (0.61)
Meal1 (picture 2): Rice, Bok Choy soup and fried mixed vegetables (Dinner)
Object detection results:
Ground Truth
Microsoft Azure
Google Vision
Amazon Rekognition
IBM Watson
Rice
Bowl
Food (0.77)
Undetected
/
Fried Mixed Vegetables
Carrot (0.53)
Undetected
Undetected
/
Chopsticks
Kitchen Utensil (0.70)
Undetected
Undetected
/
Spoon
Kitchen Utensil (0.71)
Undetected
Undetected
/
Bok Choy soup
Bowl
Food (0.75)
Undetected
/
Labeling results:
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOGNITION
IBM WATSON
Food (0.99)
Food (0.98)
Plant (0.99)
Olive Green Color (0.96)
Plate (0.89)
Tableware (0.96)
Produce (0.98)
Dish (0.91)
Bowl (0.86)
Ingredient (0.92)
Food (0.98)
Nutrition (0.91)
Vegetable (0.75)
Staple Food (0.88)
Vegetable (0.95)
Food (0.91)
Meal (0.53)
Recipe (0.88)
Dish (0.90)
Greenishness Color (0.85)
White Rice (0.88)
Meal (0.90)
Food Product (0.80)
Rice (0.86)
Bowl (0.90)
Utensil (0.80)
Cuisine (0.86)
Sprout (0.68)
Salad (0.69)
Leaf Vegetable (0.85)
Seasoning (0.57)
Seaweed Salad (0.69)
Dish (0.84)
Soup (0.68)
Jasmine Rice (0.79)
Produce (0.78)
Vegetable (0.77)
Soup (0.76)
Bowl (0.76)
Mixing Bowl (0.73)
Steamed Rice (0.70)
Plate (0.70)
Comfort Food (0.69)
Dishware (0.68)
Stock (0.68)
Cooking (0.67)
Namul (0.67)
Garnish (0.66)
Kitchen Utensil (0.66)
Meal 2: Rice, Cucumber soup, Mock Meat and Lettuce (Lunch)
Object detection results:
Ground Truth
Microsoft Azure
Google Vision
Amazon Rekognition
IBM Watson
Rice
Undetected
Food (0.75)
Undetected
/
Spoon
Kitchen Utensil (0.78)
Undetected
Spoon
/
Lettuce
Undetected
Undetected
Undetected
/
Mock Meat
Food (0.52)
Food (0.78)
Undetected
/
Mock Meat
Undetected
Food (0.74)
Undetected
/
Cucumber Soup
Bowl (0.67)
Food (0.78)
Undetected
/
Labeling results:
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOGNITION
IBM WATSON
Food (0.99)
Food (0.99)
Dish (0.95)
Olive Green Color (0.86)
Plate (0.99)
Tableware (0.97)
Meal (0.95)
Food (0.85)
Table (0.98)
Ingredient (0.91)
Food (0.95)
Nutrition (0.85)
Salad (0.74)
Recipe (0.88)
Plant (0.94)
Dish (0.85)
Broccoli (0.72)
Staple Food (0.88)
Bowl (0.93)
Food Product (0.79)
Container (0.54)
Fines Herbes (0.86)
Spoon (0.90)
Side Dish (0.76)
Dinner (0.52)
Dishware (0.86)
Cutlery (0.90)
Bottle Green Color (0.51)
Cruciferous Vegetable (0.50)
Cuisine (0.85)
Vegetable (0.71)
Mushy Peas (0.50)
Dish (0.84)
Produce (0.69)
Leaf Vegetable (0.84)
Seasoning (0.57)
White Rice (0.82)
Produce (0.80)
Vegetable (0.79)
Plate (0.78)
Bowl (0.78)
Jasmine Rice (0.76)
Glutinous Rice (0.75)
Meat (0.73)
Rice (0.73)
Comfort Food (0.72)
Guk (0.72)
Sauerkraut (0.71)
Soup (0.70)
Basmati (0.70)
Garnish (0.69)
Meal 3: Rice, Grilled Chicken, and raw vegetables (Dinner)
Object detection results:
Ground Truth
Microsoft Azure
Google Vision
Amazon Rekognition
IBM Watson
Bowl of Rice
Bowl (0.69)
Tableware (0.76)
Undetected
/
Bowl of Rice
Bowl (0.69)
Tableware (0.66)
Undetected
/
Raw Vegetables
Undetected
Food (0.71)
Undetected
/
Chopsticks
Undetected
Undetected
Undetected
/
Grilled Chicken
Undetected
Food (0.73)
Undetected
/
Labeling results:
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOGNITION
IBM WATSON
Food (0.99)
Food (0.99)
Plant (0.97)
Nutrition (0.86)
Fast Food (0.90)
Tableware (0.94)
Meal (0.91)
Food (0.86)
Healthy (0.89)
Ingredient (0.91)
Food (0.91)
Dish (0.86)
Recipe (0.79)
Recipe (0.88)
Dish (0.89)
Chestnut Red Color (0.80)
Fresh (0.78)
Plate (0.83)
Vegetable (0.81)
Chestnut Color (0.75)
Fruit (0.78)
Cuisine (0.82)
Produce (0.71)
Teriyaki (0.73)
Tomato (0.69)
Dish (0.82)
Bowl (0.70)
Sukiyaki (0.55)
Delicious (0.65)
Leaf Vegetable (0.81)
Dinner (0.57)
Barbecued Spareribs (0.50)
Container (0.63)
Vegetable (0.79)
Supper (0.57)
Broccoli (0.61)
Produce (0.78)
Vase (0.57)
Carrot (0.58)
Garnish (0.76)
Pottery (0.57)
Ingredient (0.57)
Natural Foods (0.75)
Jar (0.57)
Dish (0.54)
Beef (0.74)
Salad (0.54)
Steak (0.74)
Produce (0.54)
Cooking (0.74)
Tasty (0.54)
Meat (0.73)
Diet (0.52)
Citrus (0.73)
Iceburg Lettuce (0.70)
Comfort Food (0.69)
Roasting (0.68)
Grilling (0.68)
Lime (0.67)
Fast Food (0.67)
Lemon (0.67)
Fruit (0.66)
http://nielsquinten.com/wp-content/uploads/2021/01/EmptyLogoWebsite.png00Nielshttp://nielsquinten.com/wp-content/uploads/2021/01/EmptyLogoWebsite.pngNiels2021-06-16 18:51:012021-06-29 14:55:33Where are the Chopsticks? Vietnam
Overall, the systems performed poorly though they labeled “Rice” right (except for IBM Watson). Microsoft Azure and Google Vision were also a bit more specific by labeling “Rice and curry” right. Unfortunately, many labels were too general, irrelevant or clear misrepresentations. Microsoft Azure and Google vision also mislabeled the places of origin, which could be sensitive and harmful to some.
Correctly predicted images
0/3
Correctly detected items
0/8
Correct labels
6/56
Potentially harmful detections/labels
4
The above table includes only detections and labels of 80%+ confidence level, for lower confidence levels see the tables further below.
Insights
The object detection features performed very poorly on the selected images: not a single item was correctly identified. Most of the items remained undetected, and the systems identified only a handful in a general manner. In two instances, the detected items were (cultural) misrepresentations (e.g. “popcorn” instead of “rice”) that could lead to harm. This leads me to conclude that the object detection features are, in this case, severely lacking.
The labeling features identified more items, but conversely also (culturally) misrepresented many more items, which could lead to harm. “Rice” appeared easy to detect by all systems except Watson. Azure and Vision were also able to more specifically detect “Rice and curry” as well as “Steamed Rice”. These two systems also identified multiple varieties of rice (e.g. Jasmine), although the rice was actually simply a local variety of Myanmarese rice. Of course, it would be very difficult even for humans to identify the variety of rice based on these pictures.
The systems, especially Vision, also suggested a lot more labels than, for example, for Belgium. This may correlate with why more labels were correctly identified (as described in the previous paragraph).
Unfortunately, more also labels seem to come with more (cultural) misrepresentations. While for Belgium these misrepresentations mainly consisted of wrongly naming a dish or ingredient, the case for Myanmar seems more severe, especially with Azure and Vision. For both these systems, dishes were given a (wrong) place of origin (e.g. “Sri Lankan Cuisine”, “Chinese food”, “Japanese curry”, “Takikomi Gohan”, etc.). Depending on the context, these misrepresentations could become sensitive and harmful to some (e.g. cultural appropriation). Of course, the presented meals were quite common across different countries and cultures, which could mitigate harm. Also, the confidence rates were generally quite low (between 0.5 and 0.6) for these types of predictions.
Finally, the second meal included two images. Strangely, the systems performed quite different on both of these images for object detection as well as labeling. Different objects were detected and significantly different labels were given. Of course, in one image the Chicken Chili Curry with Mango Salad was on the rice itself while in the other it was still in a plastic delivery bag (difficult to even recognize for humans). This could have had an influence on the different labels. However, this would not explain why Vision had a significantly lower detection rate (-0.16) for “Rice” in the image where the rice was clearly more visible.
My recommendation
Providing more labels perhaps comes with more correct predictions, but also with many more wrong predictions and misrepresentations. Developers should find a balance between the two. Developers should also be careful to provide origins (e.g. “Sri Lankan cuisine”) of the meals as they, in this case, clearly did not match, leading to cultural misrepresentation. As was the case for Belgium, predictions should become more specific as they currently often miss a lot of nuance.
Results
Images of two different meals from Myanmar were available:
Meal 1: Rice with Fish Curry (lunch)
Meal 2: Rice with Chicken Chili Curry (lunch)
Meal 1: Rice with Fish Curry (lunch)
Object detection results*:
Ground Truth
Microsoft Azure
Google Vision
Amazon Rekognition
IBM Watson
Bowl of Rice
Tableware(0.772)
Food (0.69)
Undetected
/
Fish Curry
/
/
/
/
*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant
Labeling results:
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOGNITION
IBM WATSON
Food (0.99)
Food (0.98)
Plant (0.98)
Nutrition (0.89)
Plate (0.98)
White rice (0.94)
Food (0.89)
Food (0.89)
Jasmine rice (0.92)
Tableware (0.90)
Produce (0.89)
Dish (0.89)
Indoor (0.91)
Jasmine Rice (0.90)
Vegetable (0.89)
Food product (0.80)
White rice (0.90)
Rice (0.88)
Dish (0.83)
Tableware (0.79)
Steamed Rice (0.88)
Staple Food (0.88)
Dish (0.83)
Porcupine ball (0.72)
Rice (0.82)
Ingredient (0.88)
Meal (0.83)
Fried rice (0.66)
Rice and curry (0.78)
Recipe (0.87)
Lentil (0.72)
Fried Calamari (0.50)
Arborio rice (0.65)
Glutinous Rice (0.87)
Bean (0.72)
Sri Lankan Cuisine (0.63)
Basmati (0.86)
Sweets (0.58)
Japanese curry (0.56)
Cuisine (0.79)
Confectionery (0.58)
Takikomi Gohan (0.55)
Steamed Rice (0.78)
Breakfast (0.57)
Spiced rice (0.52)
Produce (0.78)
Dish (0.76)
Arborio Rice (0.76)
Xôi (0.74)
Comfort Food (0.70)
Chana Masala (0.70)
vegetable (0.70)
Meat (0.68)
Rice and Curry (0.66)
Stew (0.66)
Ghungi (0.61)
Indian Cuisine (0.57)
Koresh (0.55)
Meal 2 (picture 1): Rice with Chicken Chili Curry (lunch)
Object detection results:
Ground Truth
Microsoft Azure
Google Vision
Amazon Rekognition
IBM Watson
Bowl of Rice
Undetected
Bowl (0.67)
Undetected
/
Chicken Chili Cury with Mango Salad
Undetected
Food (0.68))
Undetected
/
Garlic
Undetected
Undetected
Undetected
/
Spoon
Undetected
Undetected
Undetected
/
Labeling results:
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOGNITION
IBM WATSON
Food (0.99)
Food (0.98)
Plant (0.99)
Pale yellow color (0.77)
Table (0.98)
Tableware (0.92)
Vegetable (0.92)
Utensil (0.71)
Plate (0.97)
Staple food (0.88)
Food (0.96)
Spoon (0.70)
Bowl (0.77)
Ingredient (0.88)
Produce (0.96)
Emerald color (0.68)
Fast food (0.71)
Recipe (0.88)
Sprout (0.80)
Ladle (0.61)
Dish (0.51)
Cuisine (0.86)
Bean Sprout (0.61)
Food product (0.60)
Mixture (0.83)
Grain (0.60)
Food (0.60)
Dish (0.80)
Lentil (0.59)
Scoop (0.60)
Produce (0.77)
Bean (0.59
Tableware (0.59)
Rice (0.70)
Rice (0.59)
Tablespoon (0.57)
Comfort Food (0.67)
Meal (0.58)
Superfood (0.67)
Bowl (0.65)
Spoon (0.64)
Cutlery (0.64)
Stuffing (0.64)
Kitchen Utsensil (0.63)
Meat (0.61)
Cooking (0.58)
Food Additive (0.58)
Break cereal (0.57)
Fast Food (0.56)
Vegetable (0.55)
Chinese Food (0.55)
Thai Food (0.55)
Meal 2 (picture 2): Rice with Chicken Chili Curry (lunch)
Object detection results:
Ground Truth
Microsoft Azure
Google Vision
Amazon Rekognition
IBM Watson
Bowl of Rice
Popcorn (0.78)
Packaged goods (0.69)
Undetected
/
Chicken Chili Cury with Mango Salad
Undetected
Packaged goods (0.90)
Pineapple
/
Labeling results:
MICROSOFT AZURE
GOOGLE VISION
AMAZON REKOGNITION
IBM WATSON
Table (0.99)
Food (0.97)
Food (0.94)
Alabaster color (1)
Food (0.99)
Ingredient (0.90)
Rice (0.84)
Shellfish (0.55)
Plate (0.93)
Cuisine (0.86)
Produce (0.61)
Invertebrate (0.55)
Recipe (0.86)
Pineapple (0.61)
Animal (0.55)
Dish (0.85)
Fruit (0.61)
Seasnail (0.55)
Staple food (0.85)
Gastropod (0.55)
Tableware (0.73)
Common limpet (0.54)
Chemical Compound (0.69)
Succulent (0.53)
Vegetable (0.67)
Plant (0.53)
Comfort Food (0.65)
Feather ball (0.53)
Plant (0.62)
Produce (0.59)
Oven Bag (0.58)
Fashion Accessory (0.58)
Jasmine Rice (0.57)
Rice (0.54)
Dairy (0.52)
http://nielsquinten.com/wp-content/uploads/2021/01/EmptyLogoWebsite.png00Nielshttp://nielsquinten.com/wp-content/uploads/2021/01/EmptyLogoWebsite.pngNiels2021-06-16 12:37:372021-06-29 14:56:02Finally some good Curry: Myanmar