Microarrays, key genome expression trackers, work better when probes are sequence-verified


Many widely used probes don't match latest RefSeq database information

BETHESDA, Md. (July 22, 2004) -- Microarray technology, sometimes referred to as biochips, has been extensively used to investigate genome-wide expression patterns and has facilitated a revolution in the characterization of cellular regulation. In addition, comprehensive gene expression profiling shows great potential for human disease diagnostics.

For instance, multiple research groups have shown that microarray data can identify previously unappreciated molecular subtypes of lung cancer that differ in their prognoses. Unfortunately, poor reproducibility of results exists across studies.

Furthermore, there is now a tremendous volume of data, particularly from human clinical specimens, which can't be duplicated, so strategies to improve analysis of (that is, "clean up") existing data sets are needed. One limitation of the application of microarray technology could be due to the failure of similar studies to measure identical biological parameters. In other words, the problem could arise from the fact that many of the microarray probes – and there are now up to hundreds of thousands on a single slide – are often based on gene sequences that are five years old, or more.


Frustrated by more than two years of trying to analyze microarray data contrasting two known conditions, researchers at Harvard Medical School and Washington University in St. Louis decided to look at the nucleotide sequences that measure gene expression on the most widely used commercial microarray technology. They found that in many cases they did not match the most current information.

In this study, they undertook a global analysis of the microarrays and systematically attempted to confirm the accuracy of individual probe sequences. They looked at every probe on the array to see if it corresponded with the gene that it was intended to measure. They found that an important percentage of the probe sequences -- sometimes as much as 20%, on both old and currently used platforms – didn't perfectly correspond with the appropriate mRNA as defined by the reference sequence (RefSeq).

Research at Harvard's Brigham & Women's Hospital

The study, entitled "Increased measurement accuracy for sequence-verified microarray probes," will appear in the August 2004 edition of Physiological Genomics, one of 14 journals published by the American Physiological Society.

Researchers Brigham H. Mecham, Daniel Z. Wetmore and Thomas J. Mariani worked in the Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women's Hospital (BWH) at Harvard Medical School, Boston; Zoltan Szallasi and Isaac Kohane were at the Children's Hospital Informatics Program of Harvard Medical School; and Yoel Sadovsky was at the Department of Obstetrics and Gynecology, Washington University School of Medicine, St. Louis, MO.

The work in this paper was supported by the Harvard Lung Biology Center, HL071885 (TJM), ES11597-01 (YS) and the Francis Families Foundation.


The researchers found that there were many causes for the probe sequence inaccuracies, but most notably there has been constant improvement in sequence information databases over time. Regardless of the nature of probe sequence inaccuracies, the study clearly shows that sequence-verified probes perform more consistently, and with higher accuracy, within replicates and across different versions of the technology.

They note that the leading manufacturer of such microarrays "apparently…has come to the same conclusion and has recently released a platform containing RefSeq-verified probes."

Based on a comprehensive analysis of probe sequences on the 20 most common mammalian microarray platforms, the researchers found that data derived from verified probes showed greater accuracy than from unverified probes,

  • Between technical replicates
  • Across generations of same-platform technology
  • In comparisons between different technology platforms
  • When comparing patient-oriented data from multiple, independent diagnostic microarray studies.

After identifying the limitations of the probe sequences, they used this information to improve the application of the technology. On the diagnostic side, they tested the effects of probe sequence accuracy in data from two independent breast cancer expression profiling studies. Their results indicate that restricting data to sequence-verified probes can improve the diagnostic power of microarray technology.

Discussion and data availability

The researchers stress that the result did not address a particular classification scheme but indicated that removing unverified probe sets allowed for the major component of change to be related to the underlying biology (in this data set, breast cancer) as opposed to the source of the experiments.

"As combining data from multiple microarray platforms/technologies is certain to prove a common method, our results showing increased accuracy of sequence-verified probes across platforms (oligo vs. oligo and oligo vs. cDNA) substantiate the importance of using the most reliable information to verify equivalence of measurement across technologies," the researchers conclude.

The authors have created a website for checking sequences/measurements on microarrays for the 10 most common platforms, which probably will be going up to 26 relatively soon. Called the "Lung Transcriptome," it was designed and built by Brigham Mecham, B.S., and Thomas Mariani, Ph.D., to serve "as both a microarray data repository and source for information and analytical tools for functional genomics-based, pulmonary-focused research applications."

Source: Eurekalert & others

Last reviewed: By John M. Grohol, Psy.D. on 21 Feb 2009
    Published on PsychCentral.com. All rights reserved.