The need for datasets with unknowns

We hear a lot about artificial intelligence (AI) these days, but according to four Facebook researchers not everyone might reap AI’s benefits. One of the biggest causes of this? The problem of unknowns.

Images from around the world

In the article Does Object Detection Work for Everyone?, four Facebook researchers describe how they collected images of common household items (soap, spices, etc.) from families around the world. Important to know: the families their income varied widely, ranging from anywhere between $25 to $20 000 a month.

When the four researchers tested the images on five popular image-recognition systems, they came to an interesting conclusion: the systems performed considerably worse on images from countries with lower household incomes.

The idea that image-recognition systems perform worse for some people is already widely documented. But what’s special about the article is how the researchers collected so many images the systems failed to recognize (or, in other words, were unknown to the systems). So how did they do this?

The Dollar Street dataset

The researchers obtained their images from Dollar Street, a non-AI project that aims to counter prejudices based on location and income. By portraying images of simple household items from around the world, people can see how others really live. For example, have you ever considered that spices can be kept in glass containers, empty plastic bottles with a corncob, plastic bags, or spice boxes?

So why did the Dollar Street images uncover so many unknowns? Well, image-recognition systems are typically tested (and trained) on images scraped from the internet. Of course, the internet disproportionally contains images from people with cameras, computers and internet access. People with low incomes and from certain geographies are thus unevenly represented.

The researchers conclude that the systems’ poor performances on certain images is primarily due to changes in the appearance of the items and their environments. Considering many objects and environments in the test images were not commonly available on the internet, the AI systems had never really seen them before and thus could not recognized them. They were unknown to the systems.

What we can learn

So what can we learn from the article discussed above? First, we need to acknowledge that unknowns are still an enormous problem in the field of AI. Second, to find these unknowns, we need to validate AI systems with images from outside our common datasets as well as outside our own life worlds.

In other words, we have a responsibility to consider which contexts – such as income and geography – impact the performance of AI systems, and then go out in the wild to collect images at the corners and edges of those contexts.