Debugging biased algorithms in healthcare, beyond

Related Posts

Earlier this term, I found myself in an ice cream shop late at night, eavesdropping on a group of students working on a data science problem set. Because UC Berkeley students end up in influential roles in companies around the Bay Area and beyond, the odds are that some of them will work in jobs designing and maintaining software that touches millions of lives. And if they do, every day they will make choices just like the ones they were making for that problem set: how to define a variable, whether to drop missing values and more.

These choices can feel small and technical and abstract. They are anything but, as they can be major vectors for bias to creep into algorithms. But they don’t have to be. As my research shows, by making these choices correctly, we can throw out the bathwater only — we can have all the benefits of algorithms without the bias.

Two weeks ago, my colleagues and I published an article showing that a widely used algorithm in the health system was biased against Black patients. This algorithm and others like it affect decisions for hundreds of millions of patients every year. Fixing the algorithm’s bias would more than double the number of Black patients eligible for a program that can help them with their health needs.

The bias has its roots in something that probably felt like a small, technical choice to those who developed the algorithm: The choice of which variable in their dataset to predict.

The algorithm is used to identify patients whose health is at risk of deteriorating — patients we’d like to know about early so we can help catch problems before they get serious. All this sounds clear enough to us humans, but it wouldn’t be clear to an algorithm. Algorithms are mercilessly literal; they don’t understand concepts — they understand only operations on variables.

So to predict a patient’s risk, the algorithm developers decided to have the algorithm predict a patient’s cost.

What made the problem we studied so perverse is that anyone could have made that choice, and many did, including other algorithm developers but also hospitals, insurers and governmental agencies. That’s because cost seems like a good proxy for illness, as illness leads to health care and health care leads to costs. To the data analyst, cost is also attractive because it’s available in nearly all health care datasets. It’s all very logical and very convenient.

The problem is that while cost is a reasonable proxy for health, it’s also a biased one. For many heartbreaking reasons, Black patients end up generating lower costs. Some of these reasons have to do with larger inequalities in our society: it’s hard to get health care even when you’re insured (like the patients we studied) because you need to get to the hospital and take a few hours off work. Other reasons have to do specifically with the color of someone’s skin. A striking recent study from Oakland showed that Black patients randomly assigned to a Black doctor had a far higher uptake of recommended primary care interventions, especially those that involved vaccines.

All this means that an algorithm that predicts costs accurately — and this algorithm did so very accurately, for Black and White patients alike — would automate injustice and racism.

This is bad news. But it’s not the end of our story. After seeing these results, we contacted the algorithm manufacturer. They were surprised, but they were also very motivated to work with us to fix the problem. And while understanding the mechanism of bias — the deep inequalities built into health datasets — was difficult, the solution was trivial: Predict another variable. In our experiments with the manufacturer, simply predicting a less biased proxy for health — the number of chronic conditions that flare up in a given year — reduced bias by 84%.

The process of finding and solving this problem felt a bit like debugging code. But this is a different kind of bug. This one didn’t cause the program to crash — it looked like it was working fine. None of the basic diagnostics would have turned it up because the algorithm was doing exactly what it was supposed to be doing, predicting cost. The problem was deeper, in what we asked the algorithm to do in the first place.

Catching these bugs requires a special kind of skill set: fluency in the language of data and in the language of the field that generates the data. It requires being bilingual.

If you’re a data scientist working on any problem that touches people’s lives — in other words, if you’re a data scientist — you must speak the language of the field you’re studying. Data science is more than knowing how to manipulate the dataset in front of you. It’s about knowing how these data were created, what complex historical and social forces shaped them and how to work around them as you build your algorithms.

If you’re in a field that will be transformed by algorithms over the coming years — in other words, if you’re in a field — you need to speak the language of data. We desperately need algorithms for our society’s most critical challenges: building a medical system that works for everyone, measuring and fighting climate change, creating policies that help people who need help the most. Those of us in medicine, law, economics, environmental science and many other fields must start taking responsibility for building algorithms to solve our own problems. We need to move beyond the simple truth that “algorithms are biased” to understanding the mechanisms that create bias and what to do about them.

The most important lesson I learned from my own research is that algorithms can do miraculous things or they can do great harm; they can be biased or unbiased; they can reinforce inequalities or redress them. Which one depends on a million small, technical choices, and the wonderful and terrifying news is, it’s all up to us.

Ziad Obermeyer is an acting associate professor of health policy and management at the UC Berkeley School of Public Health.