About once a week I get an email from a stranger asking me to interpret DNA test results.
• A young woman wants to know why ancestry testing counters her strong sense of identity as African.
• A pregnant woman wonders what having a fragile X “premutation” means for her future child.
• Paternity test results are a surprise – are the findings accurate?
What these cases share is that when I asked if I could write about them, with identifying details changed, the immediate answer from all was NO!
Despite our tendencies towards telling all on social media, DNA data seem to fall into a different category, mostly because of fear of misuse of the information. With that in mind, a group of cryptographers, biologists, and computer scientists at Stanford University recently described in Science magazine a new computational tool “to make certain that genomic discrimination doesn’t happen.”
It’s a cloaking device of sorts, like the Romulan contraption on Star Trek that made their ships seemingly vanish. The patient controls which DNA information goes to health care providers. “In this way, no person or computer, other than the individuals themselves, has access to the complete set of genetic information,” said co-author Gill Bejerano, PhD, associate professor of developmental biology, pediatrics, and computer science.
Brief history of genome privacy
The conundrum surrounding genome privacy is that researchers need to consult many genomes from healthy people to discover the genetic underpinnings of disease, especially rare ones.
We’re 99.5% alike in genome sequence. To figure out what a group of individuals with the same rare set of symptoms has, their genomes need to be compared to those of vast hordes of people who don’t have the syndrome. Finally, comparisons whittle down the data to mutations in a single gene that the sick people share but the healthy people don’t, which could explain the illness.
Concerns about genome privacy go back at least 20 years, to the Universal Declaration on the Human Genome and Human Rights. Then in 2001, just as the first drafts of human genomes were being published, came the HapMap project, providing a temporary shortcut while sequencing costs were still in the stratosphere.
“Haplotype maps” peppered human genomes with half a million markers, aka single nucleotide polymorphisms (SNPs). A person with cystic fibrosis would have a different SNP pattern than someone who doesn’t have it. SNPs were then gathered into genome-wide association studies (GWAS), which remain a shortcut to gene identification.
In 2008 came the Genetic Information Nondiscrimination Act, GINA. Employers couldn’t use genetic information to hire, fire, or promote an employee, nor could they require genetic testing. Health insurers couldn’t require genetic tests or use results to deny coverage.
The ridiculously-named law could severely undermine GINA. It would not only enable an employer to grill an employee about personal health, but also about the genetic health of relatives, and it would allow companies to compel employees to take genetic tests or pay a fine if they refuse. It’s a cruel bill, even more so now that genome sequencing costs have plummeted – as information technology has bloomed.
Discovering a threat to genome privacy
Today finding identifying details about someone is near-instantaneous, given Google. Five years ago, Melissa Gymrek, then working on her PhD at the Whitehead Institute, became concerned about the ease of assigning a name to a DNA sequence.
She was onto something, specifically the 1000 Genomes Project. From 2008 to 2015 it spawned a database, but one that had a comforting “. . . it will be hard for anyone to find out anything about you personally from any of this research” in the informed consent people had to sign before providing their DNA data.
Doubtful, Gymrek and her co-workers looked for combinations of 2-13 base long short tandem repeats (STRs), the info bits that underlie forensics testing and genetic genealogy, on Y chromosomes. Then they found these Y signatures in public genealogy databases that included surnames. Bingo!
They similarly traced X chromosome sequences through a cell bank. Details like birth year and state of residence were easy to find, and Facebook and family websites filled in even more identifying characteristics. Meanwhile, in the rare disease community, alarm bells sounded about the ease of identifying children through searching mutation databases for disease, hometown, and date of birth.
In short, it was way too easy to connect a short DNA sequence to a particular person. After the first 50 hits, Gymrek and her advisor alerted the National Institutes of Health and published “Identifying Personal Genomes by Surname Inference” in Science. The era of DNA privacy had officially arrived, even more relevant today when we can store our genome sequences on our smartphones.
How it works
The Stanford approach to genome analysis enables a user to zero in on only the part of a genome that’s relevant to symptoms, and send only that info to a genomics-savvy health care provider. It doesn’t sound much more complicated than following other directions, like planning a trip online – you don’t have to know how the coding works. Meanwhile, the healthy genomes to which those from sick people are compared are combined and stripped of identifying information.
The researchers discuss three types of situations in which their new tool analyzed less than percent of patients’ genomes information to:
• Nail mutations behind four rare diseases in under 10 seconds
• Diagnose a baby by comparing the DNA to that of the parents in under an hour
• Figure out which patients at two hospitals actually had the same diagnosis
The data aren’t just streams of A,T,C, and G, but an encrypted evaluation of each variant of each of the 20,663 genes. Could the gene’s function explain symptoms? Is the gene variant rare? A disease-causing mutation is more likely to be rare because if it’s common, it can’t be making people too sick to reproduce. That’s why comparison to a million or more sequenced genomes of healthy people is important.
Most exciting is the ability to track more than one gene at a time. Imagine knowing that you’ve inherited a gene variant that increases the risk of Alzheimer’s, like APOE e4, but not knowing that you’ve also inherited a gene variant that lowers the risk (APOE e2)? Years of unnecessary stress may follow limited genetic information, which is why a clinical diagnosis is still based on symptoms or other types of test results.
With the privacy protections of the 2008 Genetic Information Nondiscrimination Act threatened, a way to place control over personal DNA data into the hands of the owner of said DNA is good news indeed!