Waffles are Easy: Singapore

Image recognition (IR) systems often perform poorly once in the real world. In this post, I test four of the most popular IR systems on original real world images of food from around the world, this time from Singapore.

Key takeaway

Finally, the IR systems performed somewhat well! Though object detection was still lacking, several systems correctly labeled the meal in two different pictures.

Of course, the meal contained only a single, simple, and visually distinguishable item (a waffle), but nevertheless a meal was correctly labeled in full for the first time.

It was also almost detected correctly in one image, but waffle had a confidence rating of 76%, right below our cut off point of 80%. Still, many faulty labels were present as well.

Correctly predicted images 0/2
Correctly detected items 2/6
Correct labels 6/39
Potentially harmful detections/labels
7
The above table includes only detections and labels of 80%+ confidence level, for lower confidence levels see the tables further below.

Insights

Object Detection

For the first image, object detection did not work well. All three detections (of nine in total) were too general (e.g. food). However, for the second image, Azure and Rekognition (combined) detected all three items with a fairly high confidence rating.

Unfortunately, Rekognition also described waffle as a bread, which I feel is a missed opportunity and a clear misrepresentation. Vision’s descriptions remained too general for the second image as well.

Labeling

As always, the labeling systems performed better than the object detection systems. What stands out compared to previous analyses, is that both Vision and Rekognition labeled the meal in both pictures with (very) high confidence ratings (85%-100%). Of course, compared to these previous analyses, the meal consists only of a waffle, a fork, and a knife – all simple and visually distinguishable items. Nevertheless, they labeled them well.

In the first image, the waffle is spread open and clearly shows the texture of Kaya and Margarine. This detail was not picked up by the IR systems. While understandable due to it’s detailed nature, one has to wonder if we can expect IR systems to pick up on these details. And if we can expect this from these systems, how much visual similarities between different countries confuse such systems (and humans).

For instance, I personally never heard of Kaya, though its popularity in Singapore is undeniable. So, as a human coming from a Western country, I’d probably would have described it as butter. Butter visually looks very similar, yet clearly misses the mark. Therefore, this is a clear case where something looks visually similar, but – depending on your background and the context surrounding the image – is something substantially very different.

Some wrongly labeled items were also prominent. For the second image, Rekognition correctly labeled knife first with 99% confidence. However, the next three labels were weapon, blade, weaponry also with 99% confidence. While wrongly labeling a weapon as not a weapon would perhaps have worse consequences, one has to wonder the consequences of labeling a simple table knife as a weapon.

Finally, Vision labeled the waffle as Belgian waffle with the same confidence as a waffle (95%). One wonders if the fame of Belgian waffles influenced the prediction of Vision. Again, one also has to wonder to what degree an IR system can determine the origin of a meal.

Suggestions for improvement

  • Address (cultural) misrepresentations (i.e. [not all waffles are Belgian waffles]);
  • Understand the limits of IR systems and think about the consequences of these limits:
    • Can we expect IR systems to detect if, for example, a waffle has Kaya and Margarine on it simply based on an image without further context or input?
  • Understand the consequences of labeling [a simple dinner knife] as a weapon with high confidence.

Results

Two images of one meal from Singapore were available:

  • Meal 1: Waffle with Kaya and Margarine (Dessert)

Object detection results.

GROUND TRUTH MICROSOFT AZURE GOOGLE VISION AMAZON REKOG. IBM WATSON
Waffle Food (0.59) Food (0.77) Undetected /
Fork Undetected Undetected Undetected /
Knife Undetected Tableware (0.67) Undetected /

*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant

Labeling Results:

MICROSOFT AZURE** GOOGLE VISION AMAZON REKOG. IBM WATSON
  Food (0.99) Waffle (1) food (0.95)
  Tableware (0.96) Food (1) beige color (0.93)
  Waffle (0.95)   bread (0.86)
  Belgian waffle (0.95)  
food product (0.86)
  Ingredient (0.91)   nan (0.77)
  Baked goods (0.87)  
Chicken Quesadilla (0.76)
  Staple food (0.87)   dish (0.76)
  Fast food (0.86)   nutrition (0.76)
  Cuisine (0.85)   flatbread (0.5)
  Recipe (0.85)    
  Dish (0.82)    
  Finger food (0.73)    
  Junk food (0.72)    
  Dessert (0.72)    
  Produce (0.72)    
  Plate (0.7)    
  Dishware (0.69)    
  Comfort food (0.65)    
  Sweetness (0.65)    
  Kitchen utensil (0.64)    
  Snack (0.64)    
  Delicacy (0.63)    
  Waffle iron (0.63)    
  Breakfast (0.61)    
  Meal (0.59)    

**It appears that the Azure labeling API is not giving back any results at the time of analysis (only _others with a confident rating of 0.004, model version 2021-05-01 [object detection API used is model version 2021-04-01]).

Object detection results.

GROUND TRUTH MICROSOFT AZURE GOOGLE VISION AMAZON REKOG. IBM WATSON
Waffle Waffle (0.76) Food (0.74) Bread (0.89) /
Knife Undetected Tableware (0.59) Knife (0.99) /
Fork Undetected Undetected Fork (0.99) /

*Green = the right prediction; Yellow= the right prediction, but too general; Red = potentially harmful prediction; White = largely not relevant

Labeling Results:

MICROSOFT AZURE GOOGLE VISION AMAZON REKOG. IBM WATSON
  Food (0.98) Knife (0.99) beige color (0.98)
  Belgian waffle (0.96) Weapon (0.99) utensil (0.63)
  Tableware (0.96) Blade (0.99) food (0.6)
  Waffle (0.96) Weaponry (0.99) food product (0.6)
  Hood (0.9) Fork (0.99) tableware (0.56)
  Plate (0.89) Cutlery (0.99) tablefork (0.55)
  Ingredient (0.89) Food (0.91) restaurant (0.55)
  Recipe (0.86) Bread (0.89) building (0.55)
  Cuisine (0.82) Waffle (0.85) cafe (0.54)
  Baked goods (0.82)   spoon (0.51)
  Dish (0.8)    
  Kitchen utensil (0.8)    
  Grille (0.79)    
  Dishware (0.79)    
  Staple food (0.75)    
  Pizzelle (0.73)    
  Waffle iron (0.72)    
  Fork (0.72)    
  Dessert (0.71)    
  Sweetness (0.7)    
  Comfort food (0.69)    
  Produce (0.69)    
  Finger food (0.66)    
  Junk food (0.66)    
  Cooking (0.64)    

**It appears that the Azure API is currently not giving back any results (only abstract_ with a confident rating of 0.004).