Audio fingerprinting for identification of similar or identical audio in videos

Technology #m10-035

Daniel P.W. Ellis
Satish Rao
With ever-increasing databases of audio and video media, efficient methods to identify, sort, and locate similar or identical files are increasingly in demand. This technology describes a robust algorithm for acoustic fingerprinting capable of characterizing a particular audio event within a media file and identifying similar events within a database. The technology has great utility within the contexts of quickly identifying both music and cinematic copyright infringements, managing personal or public media databases, or identification of an event as recorded from multiple devices with differing background noise.

Matching pursuit algorithm expands utility to distinct events within an audio or video file.

Audio fingerprints are obtained using the matching pursuit algorithm, which decomposes a signal into a signature of distinctive energy bursts localized in time and frequency to form landmarks. The landmarks' signatures and locations in the audio files are subsequently placed in an easily searchable hash table. Similar events in a database may be quickly identified by querying the hash table. Using energy bursts as a characterization metric allows the technology to identify unique acoustic events even in the presence of noise or background.

The algorithm’s performance was tested on over 700 YouTube videos of the 2009 presidential inauguration address.

Daniel P.W. Ellis, Ph.D.


  • Sorting, classifying, and searching videos that have sound
  • Identifying whole or partial songs or sounds, such as an explosion or a car horn
  • Detecting copyright violations within larger audio or video files, such as a song or movie clip inappropriately inserted into a large montage, or played over a video clip as in background music
  • Searching for identical songs or songs with similar features from a large database
  • Generating music suggestions
  • Identifying multiple files of a single event as recorded from different persons or points of view, such as videos posted on YouTube of the Boston marathon bombing or a speech from multiple users


  • Expands audio fingerprinting to video files
  • Identifies and matches acoustic events despite differences in background or noise
  • Distinguishes between distinct events
  • Quickly searches large databases for matching or similar events

Patent Issued (US 8,706,276)

Tech Ventures Reference: IR M10-035

