Timnit Gebru looks around the AI world and sees almost no one who looks like her. That’s a problem for all of us.
Artificial intelligence is an increasingly seamless part of our everyday lives, present in everything from web searches to social media to home assistants like Alexa. But what do we do if this massively important technology is unintentionally, but fundamentally, biased? And what do we do if this massively important field includes almost no black researchers? Timnit Gebru is tackling these questions as part of Microsoft’s Fairness, Accountability, Transparency, and Ethics in AI group, which she joined last summer. She also cofounded the Black in AI event at the Neural Information Processing Systems (NIPS) conference in 2017 and was on the steering committee for the first Fairness and Transparency conference in February. She spoke with MIT Technology Review about how bias gets into AI systems and how diversity can counteract it.
How does the lack of diversity distort artificial intelligence and specifically computer vision?
I can talk about this for a whole year. There is a bias to what kinds of problems we think are important, what kinds of research we think are important, and where we think AI should go. If we don’t have diversity in our set of researchers, we are not going to address problems that are faced by the majority of people in the world. When problems don’t affect us, we don’t think they’re that important, and we might not even know what these problems are, because we’re not interacting with the people who are experiencing them.
Are there ways to counteract bias in systems?
The reason diversity is really important in AI, not just in data sets but also in researchers, is that you need people who just have this social sense of how things are. We are in a diversity crisis for AI. In addition to having technical conversations, conversations about law, conversations about ethics, we need to have conversations about diversity in AI. We need all sorts of diversity in AI. And this needs to be treated as something that’s extremely urgent.
From a technical standpoint, there are many different kinds of approaches. One is to diversify your data set and to have many different annotations of your data set, like race and gender and age. Once you train a model, you can test it out and see how well it does by all these different subgroups. But even after you do this, you are bound to have some sort of bias in your data set. You cannot have a data set that perfectly samples the whole world.
Something I’m really passionate about and I’m working on right now is to figure out how to encourage companies to give more information to users or even researchers. They should have recommended usage, what the pitfalls are, how biased the data set is, etc. So that when I’m a startup and I’m just taking your off-the-shelf data set or off-the-shelf model and incorporating it into whatever I’m doing, at least I have some knowledge of what kinds of pitfalls there may be. Right now we’re in a place almost like the Wild West, where we don’t really have many standards [about] where we put out data sets.