Substituting genetic ancestry for race in research? Not so fast

ace, widely used as a variable across biomedical research and medicine, is an appropriate proxy for racism — but not for anything biological. Proposals to use genetic ancestry instead of race are at risk of perpetuating the same problems.

Dozens of algorithms widely used in clinical care contain an adjustment factor for a patient’s race. When estimating kidney function, for example, different results are returned depending on whether the patient’s race is entered as “Black” or “non-Black,” though at least for kidney function the use of race is being challenged. Some medications have been approved only for those of certain self-identified racial groups. Meanwhile in research, the race of participants is routinely considered at almost every step of the research process — from recruitment to analysis to the interpretation of findings.

Race-based health disparities have reinvigorated the debate about whether these uses of race are appropriate, and their potential connection to racism.

To be sure, race is an important variable to track in order to understand the social drivers of health, including the impact of racism. But it is a highly problematic proxy for anything biological.

In an attempt to think about how better to capture possibly relevant biological differences between groups, one common proposal is to turn to concepts from genetics, and in particular to genetic ancestry.

But using genetic ancestry risks perpetuating the same problems as relying on race, as several colleagues and I argue in a Policy Forum essay in Science magazine. We argue that genetic ancestry can be part of the solution to understanding our different risks to developing disease and response to therapies, but only if a suitably complex conceptualization of it is adopted.

The danger in turning to genetic ancestry stems from the dominant way ancestry is currently used within genetics, as continental categories such as African ancestry, European ancestry, and the like. These categories are easy to conflate with racial categories. European ancestry, for example, is conflated with “white” race. This confuses a sociopolitical concept with a biological one. This well-meaning “solution” ends up perpetuating the same problem inherent in racial categories: that humans can be sorted on the basis of their biology into a small number of types. Such beliefs have been the source of great harm. This provides an ethical imperative to move away from the use of continental ancestry categories.

There is also a scientific imperative to move away from their use. Here is what is meant by genetic ancestry: An individual’s genetic ancestry is the paths through their family tree by which they have inherited each segment of their DNA. Population categories are not integral to this definition; imposing any set of categories is a choice that researchers must make and justify.

There are good reasons not to impose continental ancestry categories.

Continental ancestry categories fail to adequately capture human diversity. Newly assembled datasets, such as those referenced in Science, highlight that there are no distinct categories of genetic variability, only blurred continuities. Recent high-profile studies in statistical genetics have shown that, in many cases where the use of population categories was previously considered necessary, categories can be avoided entirely. When basic and translational researchers can avoid categories, they should do so.

Continental ancestry categories also give a very incomplete picture of our ancestries. Each of us have ancestors from every point in our species’ past. One set of ancestry categories reflects just one point in that past, so this multidimensional historical picture is flattened when just one set of categories is referred to.

New data increasingly allows us to explore different time slices. The human species, for example, was interbreeding with Neanderthals 50,000 years ago. The best model suggests that three different human groups intermingled in Europe 5,000 years ago to forge modern day Europeans. Five hundred years ago, waves of migration and the trade in enslaved peoples were creating new patterns of genetic diversity in the Americas. These different time slices can be of medical relevance. For example, one of the main genetic variants found to be linked to Covid-19 severity was later associated to a genomic region humans inherited from Neanderthals. As researchers try to understand the relevance of humans’ genetic backgrounds, they should routinely consider multiple sets of categories, representing multiple time slices.

A consideration of the values, ethics, and purpose of human biological research should force researchers and those who apply the results of this research to move away from easy categorization and adopt a more complex version of genetic ancestry — one that reflects the continuous nature of genetic variation and its historical depth. Change is never easy. At a minimum, to achieve this the research community will need new widely available software tools to enable the use of categories representing multiple time slices, as well as educational materials for researchers, scientists, clinicians. And publishers and funders will need to reconsider what types of work they will promote.

The willingness of academic and health care institutions to re-examine their use of race presents a window of opportunity to move away from the use of race as a biological variable. To make the most of this opportunity, they must adopt a complex conceptualization of genetic ancestry and not allow continental labels that reassert prior racial groups under an ostensibly race-blind language to become the new default.

Anna C. F. Lewis is a research associate for the E. J. Safra Center for Ethics at Harvard University.

You Might Also Like

The criminalization of human rights

UN experts arrive in Mauritius to assist in oil spill

October 2 International Day of Nonviolence