Searching Audio and Video by Sound

"An ability to locate the sound you want in a database of such sounds, is clearly of far-reaching economic value. The Comparisonics breakthroughs in this area are fun to play with and, more seriously, of very great potential commercial value." — Dr. Harry M. Markowitz, Nobel laureate in Economics, and Computer Science pioneer

The Problem

One of the most important operations performed by computers is searching. Efficient methods for the search and retrieval of numeric data and text documents have been the focus of computer scientists for more than 50 years. In recent years, computers have rapidly evolved from numeric and text processing to include multimedia, specifically audio, video, and images. However, few methods exist for searching multimedia.

Audio and video collections have been searchable only through text descriptions. That is, each audio or video clip is described in words by a human cataloger, who types the description into the computer. These descriptions are then searched by keyword to locate clips of interest.

"Woefully insufficient!" says The Hollywood Reporter in its white paper on multimedia retrieval. Creating text descriptions for audio and video is not only a burden costing time and money, but the value of such descriptions is limited. Sounds are difficult to describe in words (is it a bang, a crash, a thud?) and as a result, audio collections have been largely inaccessible.


The Solution

It makes sense to search documents based on the words they contain, and to search audio based on the sounds it contains. Comparisonics Corporation has invented the technology that makes this possible. It is called sound matching, and it is the ability to compare sounds, automatically by computer, and discern whether they are similar, and to what extent. Given any sound, similar sounds can be located automatically in audio and video collections. In database terminology, this is a form of "query by example" and "content-based retrieval."

To search audio or video, the digital audio content is first characterized by an automated indexing process. Then any sound can be used to find matches in the indexed collection. The given sound, called the prototype, is compared automatically with each sound in the collection. The degree of similarity between two sounds is indicated by a similarity score that ranges from 0 (least similar) to 100 (most similar). This score permits sounds to be ranked by similarity to the prototype.

Recordings can be characterized in about 1% of the time it takes to play them, plus the time required to read the audio data from storage media. Once characterized, thousands of hours of audio and video can be searched in seconds! Because audio can be characterized in "real time," even live material can be searched, such as ongoing radio and television programs, and multimedia streamed over the Internet.

The Comparisonics® sound-matching technology works for all possible sounds, including sounds from people, animals, machinery, and musical instruments, as well as noise, electronic tones, and environmental ambiences. It works for any number of audio channels, up to 32 bits of resolution per channel, and for all sample rates of at least 8,000 samples per second.


 Home   Overview   Technologies   Applications   Sound Gallery   FindSounds.com   FindSounds Palette   About Us   Contact Us 

© 2010 Comparisonics Corporation