We’ve all been there: we’re in the car or doing chores at home when we think of a song we want to listen to on our smartphone. So, we say “Hey Siri” or “Hey Google”, to put in the request and keep doing what we’re doing. But Siri and Google don’t pick up on what we said the first time. Or the second time. And by the third time, we’re just reaching for the phone to search for the song manually. Maybe it’s the name of the song, or the way we’re pronouncing it, but in this situation, speech recognition software supported by artificial intelligence (AI) misunderstanding us is comical at best, frustrating at worst.
But what if AI’s misunderstanding was the difference between life and death, or success and failure?
That’s an everyday reality for many non-white, non-Male, and non-American tech users across the world. Research shows that AI consistently shows bias in favor of white, English-speaking men, in comparison to other demographics. This imbalanced bias particularly impacts women of color, with Black women facing the most negative experiences with AI technology.
According to the 2018 study, “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification”, gender classifiers developed by Microsoft, IBM, and Chinese startup Face++ were compared against one another in their ability to accurately recognize whether an image shown portrayed a man or a women, and whether that man or woman was white or Black. As explained in the article, “Facial recognition software easily IDs white men, but error rates soar for black women”, across all three technologies, the software error rate for identifications was extremely low for men, with identifications of white men having the lowest error rates, and higher for women, with error rates for Black women being 29 percentage points higher than the average error rate for white men.
The failure of AI for women and communities of color doesn’t just fall in the realm of facial recognition technology, but also, as referenced earlier, in voice recognition technology. In 2018, researchers partnered with The Washington Post to study the inequities in voice recognition technology for Google Assistant and Amazon Alexa. The results of this study were published on The Washington Post’s website, in an article titled “The Accent Gap”, and showed stark differences in technology’s ability to understand and respond to accents from over 100 people from 20 cities.
While some of the data showed smaller error rates, such as Southern American accents being 3% less likely to be understood by the technology than Western American accents, the largest error rates came for non-native English speakers. Across the board, for non-native English speakers, inaccuracies occurred 30% more often than for those who grew up speaking American English. For example, individuals who speak Spanish as their first language were misunderstood 6% more often than individuals who grew up speaking English on the West Coast, where many tech companies are based.
The reasoning for this is straightforward, according to data scientist Dr. Rachael Tatman.
“These systems are going to work best for white, highly educated, upper-middle-class Americans, probably from the West Coast, because that’s the group that’s had access to the technology from the very beginning.”
Dr. Tatman’s study, “Gender and Dialect Bias in Youtube’s Automatic Captions”, shows that not only are diverse dialects negatively impacted by voice recognition technology, but that women are also shortchanged by AI’s ability to understand and respond to voice. According to the study, women posting content on YouTube are 13% more likely to be misunderstood by the site’s automatic closed captioning when compared to men. This is particularly damaging, considering that Youtube’s automatic closed captioning is in place to ensure equity for individuals who are deaf or hard of hearing.
So, how do these errors and technological failures affect under-resourced members of our community? In more ways than you might think.
For differently-abled folks who rely on recent advances in technology to make their day to day lives easier, if the technology they’re using doesn’t understand their voice, make correct translations of voice to text, or won’t recognize their face, they could be left in a difficult situation.
As more and more companies and organizations rely on facial recognition technology, including the police force, to identify employees, clients, and potential suspects, hearing that people of color and women can be erroneously misidentified should bring you a feeling of concern.
Unlike many forms of technology, artificial intelligence has the ability to learn by exposure and interactions with humans. Siri, for example, learns how to better serve its users over time by building knowledge off of common voice commands or methods of use. And that’s the goal of Mozilla Common Voice. Through their website, Mozilla Common Voice offers the opportunity for anyone, from any background, to contribute voice recordings of common words, such as numbers, in an effort to diversify the recordings being used to teach AI to understand the human voice. The goal of Mozilla Common Voice is to “help make voice recognition open and accessible to everyone”.
In order to lessen, and eventually eliminate biases in AI, it is imperative to ensure that not only are these technologies made available to all individuals, but that individuals from Black and Brown communities are in the room during the creation and implementation of these technologies. By allowing for wider exposure, and a team of more diverse engineers, scientists, and software developers, AI can, like a child learning how to respect others, learn to better serve women, people of color, and other members of underrepresented communities. It is these actions that can help to democratize artificial intelligence and make it accessible (and equitable) for all.