Confidentiality of genetic databases questioned by Stanford researchers


STANFORD, Calif. - In their exuberance over cracking the genetic code, scientists have paid too little attention to privacy issues, say researchers at the Stanford University School of Medicine. Their findings, published in the July 9 issue of Science, suggest that traditional means of ensuring confidentiality do not apply to genetic data and that additional safeguards are needed to protect patients from potential abuses.

"I am surprised that no one has looked at this problem before and asked, 'Can we really release genome-wide information about individuals to the public,'" said Zhen Lin, a genetics graduate student who led the study. "Nobody did a careful calculation to find whether 'anonymous' patients could be identified from this data."

Many hope that in the future, a patient's genetic information will provide early warning of a predisposition to certain diseases or predict how the patient will respond to certain drugs. The relationship between DNA and such traits is usually quite subtle, however, and decoding it may require compiling genetic and medical information from many patients. To this end, the National Institutes of Health is encouraging the researchers it funds to put genetic sequence and trait information in shared online databases.

A 1996 federal law that governs medical privacy requires that research data be stripped of identifying information such as names, addresses and even the last three digits of a patient's ZIP code before it can be shared. But the law is essentially silent on the issue of DNA, and most researchers have interpreted this to mean that sharing sequence data linked to information from a patient's medical history is safe.

"Traditionally people believe that if there is no identifier attached, then the sample is anonymous," Lin said. "We found that's really not true because the DNA code itself is an identifier." To demonstrate this, the researchers looked at specific sites in DNA that commonly vary from person to person, accounting for many genetic differences. Each person has about 5 million such sites in their DNA. Using a statistical model, the researchers found that matching 100 of these sites would identify an individual to a high degree of certainty.

In theory, if a person collected a small amount of genetic information about a former research subject, he could match it to database material in the future to get personal medical information about the subject.

Why worry? Lin said that insurance companies and employers potentially have an interest in learning whether a person is prone to certain illnesses, and that malevolent individuals might also try to seek out this type of information. So Lin, along with advisor Russ Altman, MD, PhD, associate professor of genetics and of medicine, and statistics professor Art Owen, PhD, looked for ways to disguise the data while still maintaining its usefulness. For example, a computer program could randomly change one in 10 data points - a method similar to those used with other kinds of sensitive research data, such as census results. But, Altman said, "We realized that anything we tried ruined the data for research."

The best solution, the group concluded, is probably to put such databases behind firewalls, and only allow access to those who can prove they are researchers and who pledge to protect confidentiality. But they don't rule out other solutions. For now, they hope their paper will induce researchers to address privacy issues early on in genetic database development.

Source: Eurekalert & others

Last reviewed: By John M. Grohol, Psy.D. on 21 Feb 2009
    Published on All rights reserved.