Columbia University

Technology Ventures

Method to identify disease biomarkers from gene expression data using attractor metagenes

Technology #cu12283

Questions about this technology? Ask a Technology Manager

Download Printable PDF

Image Gallery
Dimitris Anastassiou
Managed By
Beth Kauderer
Patent Protection
US Patent Pending 20150105272

With recent advances in sequencing technology and data mining algorithms, it is now feasible to analyze large data sets of gene expression profiles in order to identify genes that are associated with a given disease. However, current data mining algorithms often identify false positives for disease associations, and are also unable to isolate genes that are responsible for a single phenotype or biological event that occurs during the course of a given disease. This technology is a method that captures information most important to disease phenotype from publically available gene expression profiles. The technology uses “attractor metagenes” to identify biomarkers that can point to the core of the underlying biological mechanisms of a given disease. A metagene is a linear combination of individual genes that represent biological phenotypes developed to simplify gene expression data. As such, this technology facilitates gene expression data mining by identifying and grouping the most pertinent genes related to a specific disease phenotype, which may then serve as biomarkers for diagnosis, prognosis, and choice of therapeutic treatment for a given disease.

Data mining algorithm that simplifies identification of genetic biomarkers for specific disease phenotypes

This technology provides a method that can generate attractor metagenes from publically available data sets of gene expression profiles, thereby streamlining the biomarker data mining process. The technology consists of an algorithm that uses an iterative process, which starts from any seed gene, and converges to one of several precise attractor metagenes that represents a biomarker for a specific phenotype. While similar algorithms identify combinations of genes representing multiple phenotypes related to a disease, this technology isolates the genes that affect a single disease phenotype, such as cell transdifferentiation or the presence of an amplicon. As a result, this technology may identify genetic biomarkers that more accurately portray specific phenotypes or biological events that occur during the course of a given disease. Consequently, these attractor metagenes may better inform our clinical decisions for diagnosis, prognosis, and choice of therapeutic treatment for a given disease. Additionally, they may also elucidate the underlying biological mechanisms behind the disease, enabling future discovery of therapeutic targets.

This technology has been tested on six rich gene expression data sets from three different cancer types, and was able to identify attractor metagenes representing several biological events that drive the formation of tumors in these cancers, including a tumor stage-associated mesenchymal transition, a tumor grade-associated mitotic chromosomal instability, as well as several cancer amplicons.

Lead Inventor:

Dimitris Anastassiou, Ph.D.


  • Method to data mine for genetic biomarkers of a specific disease phenotype
  • Biomarkers to assist in diagnosis/prognosis/treatment choice for a given disease
  • Research tool to help understand underlying mechanisms behind diseases for better drug target discovery


  • Attractor metagenes can be easily viewed in a linear format
  • Attractor metagenes group together only the most pertinent individual genes extracted from large data sets of gene expression profiles
  • No prior knowledge of gene function or gene interaction is needed.
  • Attractor metagenes can accurately portray a single phenotype or biological event that occurs during the course of a given disease

Patent information:

Patent Pending (US 20150105272)

Tech Ventures Reference: IR CU12283