Considering race in colon cancer prediction reduces disparities

Taking race into account when developing tools to predict a patient’s risk of colorectal cancer leads to more accurate predictions when compared with race-blind algorithms, researchers find.

While many medical researchers have argued that race should be removed as a factor from clinical algorithms that predict disease risks, a new study finds that, at least for colorectal cancer, including race can help correct a data issue – inaccurate recording of family history for Black patients.

Having relatives with colorectal cancer is a known risk factor for the disease, but Black patients are less likely to have an accurate recorded history in their medical records. Considering race can help correct for this, potentially identifying more Black patients who would benefit from cancer screening.

“If you don't use race, what you're effectively doing is you’re telling your algorithm, pretend that family history is equally useful for everyone, and that’s just not true in the data,” said Emma Pierson, senior author on the new study and the Andrew H. and Ann R. Tisch Assistant Professor of computer science at the Jacobs Technion-Cornell Institute at Cornell Tech and in the Cornell Ann S. Bowers College of Computing and Information Science. 

She collaborated with Anna Zink of the University of Chicago and Ziad Obermeyer of the University of California, Berkeley on the new research, “Race Adjustments in Clinical Algorithms Can Help Correct For Racial Disparities In Data Quality,” which was published Aug. 13 in Proceedings of the National Academy of Sciences.

To evaluate the impact of race on clinical algorithms for colorectal cancer, the researchers predicted the future risk of cancer for 77,836 racially and economically diverse participants in the Southern Community Cohort, a National Cancer Institute (NCI)-funded initiative aimed at understanding the causes of cancer and other major diseases. The participants, aged 40 to 74, had no medical history of colorectal cancer when they joined the cohort.

The research team developed a pair of algorithms for predicting colorectal cancer risk: one that included race, and another that did not. They used risk factors included in the NCI Colorectal Cancer Risk Assessment Tool, which considers a person’s age, weight, diet, exercise habits and family and medical history. Then they compared how well the two algorithms could predict colorectal cancer risk for Black and white participants.

The analysis showed that Black participants were more likely to report an unknown family history of cancer, suggesting this data might be less reliably recorded. Consistent with this finding, family history data was less helpful for predicting future cancer risk for Black participants.

Researchers found that the race-blind algorithms underpredicted cancer risk for Black participants and overpredicted the risk for white participants, while the race-adjusted algorithm more accurately predicted risk for each group. When the algorithm accounted for race, 74.4% of participants ranked in the half with the highest risk were Black, compared to 66.1% with the race-blind algorithm. When classified as high-risk, individuals may have better access to screening or other health care services.

There are important reasons to reconsider the use of race in medical algorithms, Pierson said. Many of these algorithms rely on outdated data, include false and biased beliefs about race or yield results that exacerbate health disparities. Some likely have led to Black patients being denied critical health care, such as kidney transplants, osteoporosis treatments or appropriate breast cancer screening.

The study by Pierson and her colleagues highlights the importance of comparing race-adjusted and race-blind algorithms before taking race out of the equation. 

“A concern is that the removal of race might have unintended consequences, and we need to very carefully evaluate its impact,” Pierson said.

While tailoring algorithms with regard to race can sometimes lead to more accurate predictions by compensating for imperfect data, Pierson sees this as a stopgap measure.

“We need to design algorithms that make the best predictions we can for the patients we see today. That is our responsibility as designers,” Pierson said. “But on a longer time frame, it’s also really important that, yes, we improve the quality of medical data. This is not an acceptable state of affairs – we need to fight a two front battle.”

Pierson is also an assistant professor of population health sciences at Weill Cornell Medical College.

Patricia Waldron is a writer for the Cornell Ann S. Bowers College of Computing and Information Science.

Media Contact

Becka Bowyer