Suggestions to Improve Image Recognition Systems to Recognize Food from Around the World? (Report)

I’m testing four of the most popular image recognition systems (Azure, Vision, Rekognition and Watson) on self-collected, original images of food from around the world. The results so far in numbers are:

  • Correctly predicted images: 0/24 (0%)
  • Correctly detected items 7/111 (6%)
  • Correct labels: 37/516 (7%)
  • Potentially harmful detections/labels: 49

Update: 29/06

I’ve tested the IR systems on five new countries: the Philippines, Canada, the US, Yemen, and Germany (6 meals and 13 images). The results were poor across all countries. If correct at all, detection and labeling descriptions overall remained too general (e.g. Food, Dish, etc.).

Results in numbers:

  • Correctly predicted images: 0/13 (0%)
  • Correctly detected items 1/45 (2%)
  • Correct labels: 14/258 (5%)
  • Potentially harmful detections/labels: 44

Insights

Watson, Rekognition and to a lesser degree Azure seem to be trigger happy to label food items (e.g. Rice and Kaydos) as Ice Cream. Is this due to a (Western) bias towards ice cream? Too early to tell really, but nonetheless an interesting direction to explore in future updates.

Though Rice was labeled as Ice Cream multiple times, at other times Rice seemed to be one of the most easily detected items, especially for Azure and Vision. So this leads us to wonder where the threshold for Rice and Ice Cream is located.

Unfortunately, potentially harmful descriptions were common, especially in the form of misplaced references to the origin of the meals as well as confusion between types of meats. More blatant forms of cultural misrepresentation were found too. For example Azure assigned 18 different sausage types to an image of spring rolls.

Finally, with labels such as Gluten and Sugar, one has to wonder what we can realistically expect from IR systems. How can these systems possibly know if a Bagel is gluten or sugar free without any context? Even most humans would find this task impossible.

Suggested improvements:

  • Provide more specific and relevant labels for:
    • BBQ, [Cup of] BBQ sauce, Kaydos, sliced melon, crab cake, spring rolls, bagel, minestrone, bottle of wine, olives and pesto
  • Fix (cultural) misrepresentations:
    • Rice, Kaydos and Crab Cakes are not ice cream, sliced melon is not a banana, spring rolls are not sausages, rice is not oatmeal;
    • A Thali (Indian serving plate) in an image possibly skews Yemeni food results towards Indian food results.
  • Understand the limits of IR systems and think about the consequences of these limits:
    • Can we expect IR systems to distinguish between similar dishes of different countries without further context or input?
    • Can we expect IR systems to detect if, for example, a meal is gluten or sugar free simply based on an image without further context or input?

Update: 17/06

So far, I’ve tested the systems on food from BelgiumMyanmarVietnam and Malaysia (9 meals and 11 images). With these admittedly limited samples, I wanted to write an interim conclusion to make the results of the project easily digestible.

Thus far, the results have been disappointing to very disappointing. Object detection consistently failed across the four systems. The systems labeled the images better, though these labels also often included too general or irrelevant descriptions.

In some cases, these descriptions culturally misrepresented (e.g. mistake the country or culture of the dish) the food, which could be controversial in some contexts. In one instance, “Chicken” was described as “Beef”, which in some contexts could disadvantage people from certain religions. In another instance, “Mock Meat” was labeled as “Meat”, which could similarly disadvantage people from certain religions as well as vegetarians/vegans.

Note: all numbers on this page are only from detections and labels of 80%+ confidence level, for lower confidence levels see the countries’ individual analyses.