Columbia University

Technology Ventures

Estimation of allelic differences from high-throughput sequencing experiments

Technology #2930

Biomedical researchers compare DNA samples by next-generation sequencing to identify individual point mutations and small deletions and insertions (indels) in DNA and RNA sequences. In particular, researchers would like to correlate the change in frequency in the presence of mutations with phenotypic changes. Therefore, to obtain useful sequencing data it is essential to understand the frequency at which certain alleles are represented in the cell population. However, many technical difficulties currently exist, including inhomogeneous samples, existence of subclonal populations, and errors introduced by the sequencing technology itself. Therefore a new method is needed to allow researchers to ascertain the frequency of alleles across a range of samples.

Computational algorithm accurately identifies allele frequency across different samples

This technology is an algorithm (Statistical frequency Analysis of Sequence Data, SAVI) that can be used to identify particular alleles in a sample and to compare those frequencies with different samples. Sequencing data is compared to a reference genome to identify potential variants at a given position. The algorithm’s statistical framework also references prior data, such as the frequency of allelic differences and can account for the quality of the sequencing reaction and the purity of the sample. This information is used to predict the validity of allelic differences and enhance sensitivity. The technology naturally accommodates multiple samples and multiple reference genomes. It has been successfully implemented to identify mutations in Hairy Cell Leukemia and Large B- Cell Lymphoma patients versus healthy controls.

Lead Inventor:

Raul Rabadan, Ph.D.


  • Monitoring genetic variation during disease progression, for example tumor samples
  • Studying genetic variation between diseased and healthy samples
  • Tracing the trajectory of disease progression through the frequency of somatic mutations
  • Identifying alleles in a disease to use as a prognosis marker for treatment
  • Development of complex experimental designs comparing multiple samples


  • Enhanced sensitivity to identify allelic differences from high throughput sequencing reads
  • Frequency of alleles provides greater biological insight into the mechanism of disease pathogenesis and progression
  • Allele frequencies can be compared across multiple samples, for example in a cohort of cancer patients versus healthy controls
  • Provides confidence that mutations found by high throughput sequencing are valid and not artifacts of the sequencing and/or computer alignments

Patent Information:

Patent Pending

Tech Ventures Reference: IR 2930

Related Publications: