MARK RALSTON/AFP via Getty Images
Doctors, data scientists and hospital executives believe artificial intelligence may help solve what until now have been intractable problems. AI is already showing promise to help clinicians diagnose breast cancer, read X-rays and predict which patients need more care. But as excitement grows, there’s also a risk: These powerful new tools can perpetuate long-standing racial inequities in how care is delivered.
“If you mess this up, you can really, really harm people by entrenching systemic racism further into the health system,” said Dr. Mark Sendak, a lead data scientist at the Duke Institute for Health Innovation.
These new health care tools are often built using machine learning, a subset of AI where algorithms are trained to find patterns in large data sets like billing information and test results. Those patterns can predict future outcomes, like the chance a patient develops sepsis. These algorithms can constantly monitor every patient in a hospital at once, alerting clinicians to potential risks that overworked staff might otherwise miss.
The data these algorithms are built on, however, often reflect inequities and bias that have long plagued U.S. health care. Research shows clinicians often provide different care to white patients and patients of color. Those differences in how patients are treated get immortalized in data, which are then used to train algorithms. People of color are also often underrepresented in those training data sets.
“When you learn from the past, you replicate the past. You further entrench the past,” Sendak said. “Because you take existing inequities and you treat them as the aspiration for how health care should be delivered.”
A landmark 2019 study published in the journal Science found that an algorithm used to predict health care needs for more than 100 million people was biased against Black patients. The algorithm relied on health care spending to predict future health needs. But with less access to care historically, Black patients often spent less. As a result, Black patients had to be much sicker to be recommended for extra care under the algorithm.
“You’re essentially walking where there’s land mines,” Sendak said of trying to build clinical AI tools using data that may contain bias, “and [if you’re not careful] your stuff’s going to blow up and it’s going to hurt people.”
The challenge of rooting out racial bias
In the fall of 2019, Sendak teamed up with pediatric emergency medicine physician Dr. Emily Sterrett to develop an algorithm to help predict childhood sepsis in Duke University Hospital’s emergency department.
Sepsis occurs when the body overreacts to an infection and attacks its own organs. While rare in children — roughly 75,000 annual cases in the U.S. — this preventable condition is fatal for nearly 10% of kids. If caught quickly, antibiotics effectively treat sepsis. But diagnosis is challenging because typical early symptoms — fever, high heart rate and high white blood cell count — mimic other illnesses including the common cold.
An algorithm that could predict the threat of sepsis in kids would be a gamechanger for physicians across the country. “When it’s a child’s life on the line, having a backup system that AI could offer to bolster some of that human fallibility is really, really important,” Sterrett said.
But the groundbreaking study in Science about bias reinforced to Sendak and Sterrett they wanted to be careful in their design. The team spent a month teaching the algorithm to identify sepsis based on vital signs and lab tests instead of easily accessible but often incomplete billing data. Any tweak to the program over the first 18 months of development triggered quality control tests to ensure the algorithm found sepsis equally well regardless of race or ethnicity.
But nearly three years into their intentional and methodical effort, the team discovered possible bias still managed to slip in. Dr. Ganga Moorthy, a global health fellow with Duke’s pediatric infectious diseases program, showed the developers research that doctors at Duke took longer to order blood tests for Hispanic kids eventually diagnosed with sepsis than white kids.
“One of my major hypotheses was that physicians were taking illnesses in white children perhaps more seriously than those of Hispanic children,” Moorthy said. She also wondered if the need for interpreters slowed down the process.
“I was angry with myself. How could we not see this?” Sendak said. “We totally missed all of these subtle things that if any one of these was consistently true could introduce bias into the algorithm.”
Sendak said the team had overlooked this delay, potentially teaching their AI inaccurately that Hispanic kids develop sepsis slower than other kids, a time difference that could be fatal.
Regulators are taking notice
Over the last several years, hospitals and researchers have formed national coalitions to share best practices and develop “playbooks” to combat bias. But signs suggest few hospitals are reckoning with the equity threat this new technology poses.
Researcher Paige Nong interviewed officials at 13 academic medical centers last year, and only four said they considered racial bias when developing or vetting machine learning algorithms.
“If a particular leader at a hospital or a health system happened to be personally concerned about racial inequity, then that would inform how they thought about AI,” Nong said. “But there was nothing structural, there was nothing at the regulatory or policy level that was requiring them to think or act that way.”
Several experts say the lack of regulation leaves this corner of AI feeling a bit like the “wild west.” Separate 2021 investigations found the Food and Drug Administration’s policies on racial bias in AI as uneven, with only a fraction of algorithms even including racial information in public applications.
The Biden administration over the last 10 months has released a flurry of proposals to design guardrails for this emerging technology. The FDA says it now asks developers to outline any steps taken to mitigate bias and the source of data underpinning new algorithms.
The Office of the National Coordinator for Health Information Technology proposed new regulations in April that would require developers to share with clinicians a fuller picture of what data were used to build algorithms. Kathryn Marchesini, the agency’s chief privacy officer, described the new regulations as a “nutrition label” that helps doctors know “the ingredients used to make the algorithm.” The hope is more transparency will help providers determine if an algorithm is unbiased enough to safely use on patients.
The Office for Civil Rights at the U.S. Department of Health and Human Services last summer proposed updated regulations that explicitly forbid clinicians, hospitals and insurers from discriminating “through the use of clinical algorithms in [their] decision-making.” The agency’s director, Melanie Fontes Rainer, said while federal anti-discrimination laws already prohibit this activity, her office wanted “to make sure that [providers and insurers are] aware that this isn’t just ‘Buy a product off the shelf, close your eyes and use it.'”
Industry welcoming — and wary — of new regulation
Many experts in AI and bias welcome this new attention, but there are concerns. Several academics and industry leaders said they want to see the FDA spell out in public guidelines exactly what developers must do to prove their AI tools are unbiased. Others want ONC to require developers to share their algorithm “ingredient list” publicly, allowing independent researchers to evaluate code for problems.
Some hospitals and academics worry these proposals — especially HHS’s explicit prohibition on using discriminatory AI — could backfire. “What we don’t want is for the rule to be so scary that physicians say, ‘OK, I just won’t use any AI in my practice. I just don’t want to run the risk,'” said Carmel Shachar, executive director of the Petrie-Flom Center for Health Law Policy at Harvard Law School. Shachar and several industry leaders said that without clear guidance, hospitals with fewer resources may struggle to stay on the right side of the law.
Duke’s Mark Sendak welcomes new regulations to eliminate bias from algorithms, “but what we’re not hearing regulators say is, ‘We understand the resources that it takes to identify these things, to monitor for these things. And we’re going to make investments to make sure that we address this problem.'”
The federal government invested $35 billion to entice and help doctors and hospitals adopt electronic health records earlier this century. None of the regulatory proposals around AI and bias include financial incentives or support.
‘You have to look in the mirror’
A lack of additional funding and clear regulatory guidance leaves AI developers to troubleshoot their own problems for now.
At Duke, the team immediately began a new round of tests after discovering their algorithm to help predict childhood sepsis could be biased against Hispanic patients. It took eight weeks to conclusively determine that the algorithm predicted sepsis at the same speed for all patients. Sendak hypothesizes there were too few sepsis cases for the time delay for Hispanic kids to get baked into the algorithm.
Sendak said the conclusion was more sobering than a relief. “I don’t find it comforting that in one specific rare case, we didn’t have to intervene to prevent bias,” he said. “Every time you become aware of a potential flaw, there’s that responsibility of [asking], ‘Where else is this happening?'”
Sendak plans to build a more diverse team, with anthropologists, sociologists, community members and patients working together to root out bias in Duke’s algorithms. But for this new class of tools to do more good than harm, Sendak believes the entire health care sector must address its underlying racial inequity.
“You have to look in the mirror,” he said. “It requires you to ask hard questions of yourself, of the people you work with, the organizations you’re a part of. Because if you’re actually looking for bias in algorithms, the root cause of a lot of the bias is inequities in care.”
This story comes from the health policy podcast Tradeoffs. Dan Gorenstein is Tradeoffs’ executive editor, and Ryan Levi is a senior producer for the show. Tradeoffs’ coverage of diagnostic excellence is supported in part by the Gordon and Betty Moore Foundation.