It’s tempting to think of artificial intelligence (AI) algorithms as a source of objective truth based on data, but that’s really not how they work. Computers can only do the things they’re told to do by people. The data they use to learn and improve also comes from people. And the harmful things that people sometimes say and do are things that computers learn from.
What is bias, and why is it important to address?
Computers don’t have insight into why their data is the way it is, or what data might be missing due to flaws in human society. They don’t know whether the ways they learn to make decisions are fair or unfair. This leads to problems such as:
- The ubiquitous example of COMPAS, an algorithm that is actually used to determine actual people’s criminal sentences, which has a higher error rate for Black defendants despite race not being an input
- An Apple / Goldman Sachs credit card whose unexplainable decisions about credit limits have led to cases of women getting drastically lower credit limits than comparable men – even when the woman’s finances are better, and even when its implementers claim “not to see gender”
- A tool called “Perspective API” that’s pitched as a way of filtering toxicity and making the Internet a kinder place, but turns out to filter out women and minorities more, while letting white supremacist statements through
We’ve written previously about the problem of bias in natural language processing (NLP), and how it’s important to address it. We don’t want the harmful things that computers learn from people to leak into business use cases when it comes to analyzing text, and moreover, we just don’t want to propagate these harmful things at all.
De-biasing is the topic in machine learning that aims to address this kind of problem, and is part of the broader field of “fairness and transparency”. All the examples above are now case studies in fairness and transparency.
What are the risks of bias in the kind of NLP we do at Luminoso?
Our clients use our products to analyze all different forms of text feedback they receive from the people they serve. This feedback includes conversational data such as open-ended survey responses, product reviews, historical documents such as maintenance notes … any text that helps organizations better understand their customers’ and employees’ needs, wants, and pain points.
Fortunately, we aren’t trying to get our products to make high-stakes decisions that ought to be made by people. We’re just trying to organize text by what it means, uncover trends in feedback that may otherwise go unread, and make free-form text measurable.
Even so, we came across the importance of addressing NLP bias, and started learning about it and
finding out what we could do to keep it out of our products.The bias our system encounters, and where we address it
Word embeddings – which build the background knowledge that we use in Luminoso products and ConceptNet – are pre-trained on large amounts of text from the Web. The bias we encounter and what we address in particular are the prejudices and unfortunate associations that can be learned in this step.
Luminoso products go beyond providing a typical “supervised machine learning” pipeline from input data to predictions – they provide features for exploring data, where a human user interacts with the machine learning results to gain a better understanding of what’s going on. So the issue we face is more than just biased predictions: it’s that our system could be influenced by unfortunate associations it learned during pre-training from the Web.
A paper by Suresh and Guttag breaks down five kinds of bias and the different places that they can occur in a machine learning system:
The biases on this diagram that are particularly involved in pre-trained background knowledge as used in our products are historical bias and representation bias.
Historical bias occurs because pre-trained representations can only learn from text that already exists, giving systems the regressive tendency to learn the inequalities of the past and use them to make predictions about new text. For example, a system that sees the word “surgeon” and thinks it must refer to a man is suffering in part from historical bias.
Representation bias appears because some viewpoints are underrepresented in the web pages that appear in web corpora, and some viewpoints are over represented.
One particular representation bias I noticed – one which would definitely appear in our product if it went unfixed – is one you might have heard about from a Broadway musical: a large number of pages on the Internet are, in fact, porn.
This is where the idea of “unfortunate associations” comes in. There are a number of words that should be benign – such as completely ordinary words describing people – that take on a different meaning when they’re used in the kind of text meant to attract people searching for porn. This text is often misogynistic, and overall not the kind of thing you want to appear in your organization’s NLP application, and it is way way over represented in the data that word embeddings learn from. This is one of the effects we correct for.
As just one example of how this kind of data can sneak into visible results, here’s a description of how a feedback loop between Twitter bots and Twitter’s machine learning made Twitter highlight pornographic hashtags as “trending”.
Is our approach the right solution to bias?
A little while back, I wrote a cautionary tutorial about racist NLP. At the time, I worried that I was giving it too much of a satisfying conclusion. A too-brief summary of the article was:
- Hey look, there’s this problem in NLP that’s so pervasive that you encounter it even when running a very simple tutorial on widely-available data.
- But we fixed it, so it’s okay as long as your dataset is ConceptNet.
That’s not it. It’s not that simple, and I tried to state the result cautiously for this reason. It’s easy to over-promise about things like this, and I know there’s a tendency for people in the tech industry to promise miracle cures for AI bias. Here’s the subtler description of what we’re doing:
We know that machine learning is unreasonably good at finding hidden correlations. If you leave it any way to learn a prejudice from these hidden correlations at any stage, it will learn it. And the tutorial shows that you can use responsibly-collected data that looks perfectly safe, and you still end up with bias that comes directly from your initial word embeddings.
The goal of our de-biasing method is to do what we can to stop word embeddings from being a source of prejudice, undoing some of the known effects of historical and representation bias. Machine-learned prejudice can still come from a different source, and affect your machine learning at a different stage, but at least it’s not coming from this obvious source.
So, did it work?
You might not want to just take our word for it. This paper by Chris Sweeney and Maryam Najafian establishes a new method for measuring demographic bias in word embeddings. It confirms that ConceptNet Numberbatch does a lot better than default GloVe.
We’ll leave you with this: the term “de-biasing” is somewhat overpromising. It suggests that there’s an algorithm that can take in human-generated data with all its flaws and biases, and produce output that is unbiased, fair, and objective. No algorithm can do that. There is no algorithmic test for objective truth or fairness. There’s only what we can design an algorithm to do, with the data it has, to meet our goals – and one of those goals can be to stop perpetuating certain harms.