Predicting cancer patient survival with gene expression data


Full size image available here

Cancer specialists often talk about cancer as an umbrella term for over 200 different diseases, each having unique characteristics. But even these categories are too broad, as the same type of cancer can take very different paths in different people. Researchers have traditionally diagnosed and treated cancer based on microscopic analysis of cell size and shape, a method that's especially difficult for very closely related cancers, such as non-Hodgkin's lymphoma, which has 20 subtypes. As scientists learn more about the molecular alterations in cancer, they're beginning to establish cancer subtypes based on the underlying molecular footprint of a tumor. Now Eric Bair and Robert Tibshirani describe a procedure that combines both gene expression data and the patients' clinical history to identify biologically significant cancer subtypes and show that this method is a powerful predictor of patient survival.

Their approach uses clinical data to identify a list of genes that correspond to a particular clinical factor--such as survival time, tumor stage, or metastasis--in tandem with statistical analysis to look for additional patterns in the data to identify clinically relevant subsets of genes. In many retrospective studies, patient survival time is known, even though tumor subtypes are not; Bair and Tibshirani used that survival data to guide their analysis of the microarray data. They calculated the correlation of each gene in the microarray data with patient survival to generate a list of "significant" genes and then used these genes to identify tumor subtypes. Creating a list of candidate genes based on clinical data, the authors explain, reduces the chances of including genes unrelated to survival, increasing the probability of identifying gene clusters with clinical and thus predictive significance. Such "indicator gene lists" could identify subgroups of patients with similar gene expression profiles. The lists of subgroups, based on gene expression profiles and clinical outcomes of previous patients, could be used to assign future patients to the appropriate subgroup.

By providing a method to cull the thousands of genes generated by a microarray to those most likely to have clinical relevance, Bair and Tibshirani have created a powerful tool to identify new cancer subtypes, predict expected patient survival, and, in some cases, help suggest the most appropriate course of treatment.

Source: Eurekalert & others

Last reviewed: By John M. Grohol, Psy.D. on 21 Feb 2009
    Published on All rights reserved.