This 2004 article, written by S. V. Rice and S. M. Bailey, tells the story of FindSounds.
Sounds are selected and incorporated into theatrical productions and radio and television programs. Music is composed of sounds, and music combined with dialogue and sound effects forms the movie soundtrack. Sounds are vital to animation and computer games.
Sounds for Theatre, Film, Radio, and Television
Sound effects were used in the ancient Greek theatre of Aeschylus, Euripides, and Sophocles. In Elizabethan theatre, scripts called for the sounds of alarms, chimes, and gunshots, and skilled vocalists imitated the baying of hounds and crowing of roosters. Many theatres utilized "thunder runs," sloping wooden or iron alleys down which cannon balls were rolled to produce the sound of thunder . In 1708, John Dennis devised an improved method for making thunder: shaking a metal sheet that is suspended by wires. His "thunder sheet" was widely copied by others, whom he accused of "stealing his thunder," originating the expression.
Silent films were accompanied by a pianist or organist and often by sound-effects artists working their craft. In the 1930s, the production of sound effects for "talkies," theatre, and radio increased in sophistication. Thousands of prerecorded sounds became available on 78 rpm phonograph records, and "manual" sound effects were created by clever use of an enormous variety of objects and devices. A 1936 "how-to" guide, written by a stage director of the Old Vic Theatre in London, instructs the "effectsman" in the art of creating "noises off" (off-stage sound effects) including household, machine, nature, and "explosive" sounds . A 1940 guide for radio describes how to make sounds using "gadgets that can be found in most attics or basements" and mentions an "alphabetical glossary" at NBC Radio containing thousands of techniques for sound generation . Such a cookbook might include the following recipes.
In Raiders of the Lost Ark (1981), the sound of face punches came from slapping a leather jacket onto the hood of an old fire engine and by dropping overly ripe fruit onto concrete; the sound of the giant rolling boulder is the sound of a Honda station wagon rolling down a gravel slope. The ghost sounds in Ghostbusters II (1989) were produced by a rice steamer [4,5].
"Sound is 50% of the motion picture experience." — George LucasSounds for Music
The musical instruments available to composers of Western music were essentially unchanged throughout the 18th and 19th centuries. By the beginning of the 20th century, composers sought to enlarge the palette of sounds. The percussion section, home to unconventional instruments, was expanded by Debussy and Strauss. In addition to innovative use of percussion, Stravinsky and Bartók devised novel techniques for playing traditional instruments to obtain new sounds.
Russolo and Marinetti of the Italian Futurist movement presented a concert in Milan in 1914 that employed "bumblers, exploders, thunderers, and whistlers." Satie's Parade, incorporating sirens, starting pistols, typewriter, and foghorn, caused a scandal in Paris in 1917; conservative listeners considered it blasphemous for music to include such sounds. A similar furor erupted in New York in 1927 when Antheil's Ballet Mécanique was performed by an ensemble of pianos, anvils, bells, buzzers, saws, car horns, and airplane propellers.
In the 1920s, Edgard Varèse crusaded for the right to make music with any and all sounds. His sentiments were echoed in the 1930s by John Cage, whose First Construction in Metal (1939) utilizes five differently-pitched thunder sheets and four brake drums. Pierre Schaeffer, the "Musician of Sounds," led a group of Paris musicians known as Musique Concrète. His pioneering composition Étude aux Chemins de Fer (1948) is a fascinating montage of sounds recorded at the Paris train depot and demonstrated that any sound is raw material for creative use.
The arrival of electronic sound synthesizers was heralded by many composers. The RCA Electronic Music Synthesizer of the 1950s could generate a sequence of sounds and the composer could specify the pitch, volume, color, articulation, and duration of each sound. Varèse extolled the "electronic medium" for adding "an unbelievable variety of new timbres to our musical store," and for "the possibility of obtaining any differentiation of timbre, of sound-combinations, and new dynamics far beyond the present human-powered orchestra." The Moog synthesizer of the 1960s, made famous by Wendy Carlos in Switched-on Bach (1968), was the first synthesizer to be mass produced, and by the early 1970s, the use of synthesizers was widespread [6-8].
"I don't care too much about music. What I like is sounds." — Dizzy GillespieTransforming Sounds to Create More Sounds
Phonographs with variable speed control were needed in the 1920s to play "78 rpm" records because the speed at which they were actually recorded ranged from 70 to 85 rpm. Interesting sounds can be created by slowing down or speeding up a recording, so the speed control became a valuable tool of the sound designer. Hindemith and Toch composed short pieces using phonographic speed change by 1930, and Varèse, Cage, and Schaeffer experimented considerably with the technique. In his book on sound effects, Robert L. Mott recounts how a single recording of a waterfall, when played at different speeds, was used to create the sounds of ocean surf, city traffic, a jet airplane, an atomic bomb explosion, and a printing press . In Indiana Jones and the Last Crusade (1989), a recording of chickens was speeded up and used as the sound of a cave filled with rats . Walter Murch, regarded as the dean of sound designers, would change the speed of a sound (for example, the outboard motor in Godfather II, 1974) so that it would harmonize with the background music and prevent dissonance .
The physical environment in which sounds are recorded can have a great influence on the recording. A carpeted living room, a tiled bathroom, a suburban backyard, and an urban alley alter sounds in distinct ways. Sounds can be recorded through a window, open or closed. The sound of Luke Skywalker's land speeder in Star Wars (1977) is the sound of a Los Angeles freeway recorded through a vacuum-cleaner tube . Ann Kroeber's Common Sounds Heard in Uncommon Ways (2000) includes sounds captured by microphones placed inside a steam iron and a soda machine.
In a process known as "sweetening," sounds are layered to create new sounds. For King Kong (1933), the pioneering Murray Spivack devised the sounds of the giant ape by blending recordings of lions and tigers, some played in reverse and at different speeds. For the 1998 version of Godzilla, sound designers developed the monster's roar by combining musical instrument and animal sounds with the original roar from the 1950s Japanese films. The voice of Chewbacca in Star Wars was constructed from bear, dog, lion, and walrus vocalizations. The sound of the sandworms in Dune (1984) was a mixture of speed-altered recordings of a baboon, horse, puma, and several pigs . The sounds of torpedoes in The Hunt for Red October (1990) were layered with "animal growls and shrieks, a Ferrari engine, and a screeching screen door spring" to "imbue the weapon with a vengeful purpose ."
Sounds may be transformed electronically by a variety of techniques including equalization, filtering, reverberation, modulation, chorusing, flanging, and phasing. Digital audio workstations make it easy to edit sounds and to juxtapose and overlay them in unlimited ways, what David Sonnenschein has so aptly termed "the digital sculpting of sounds ."
"Choice is the beginning of art." — Igor StravinskyStorage and Retrieval of Sounds
The phonograph was the first device for audio storage and retrieval. For a 1930s radio drama, a sound-effects artist would play sound-effects records using three or more turntables, each with two tone arms and speed and volume controls. Effects were marked on the records with chalk for fast cueing. The dexterity of the 1930s artist would impress today's hip-hop disc jockeys, for whom the turntable is a musical instrument in its own right.
In the early 1950s, the tape recorder became a tool for creative use. Tapes could be speeded up and slowed down, and could be cut and spliced for editing. Multi-track tapes facilitated sound mixing. In the 1960s, cartridge and cassette tapes emerged along with the "cart machine" for triggering the playback of cartridge tapes.
Audio went digital with the arrival of the compact disc (CD) in 1983 and digital audio tape (DAT) in 1987. Digital samplers also arrived in the 1980s, enabling brief digital recordings or "samples" to be played by pressing the keys of a piano-style keyboard. (The first sampler, the Mellotron, was developed in the 1960s and assigned a tape to each key.) By the 1990s, digital recordings were commonly stored in computer disk files, and affordable software became available to play, record, and edit them.
Sequential listening to audio recordings is a tedious way to search for sounds. The printed liner notes of records, tapes, and CDs provide descriptions of recordings, and in electronic form, they can be searched by keyword to locate recordings of interest. However, the value of this technique is limited by the fact that sounds are so difficult to describe.
Onomatopoeia is the formation of words to imitate sounds, for example, buzz, crunch, hiss, pop, screech, and thud. People who catalog sounds have raised onomatopoeia to an art form in desperate attempts to describe sounds. The following descriptions appear in a current sound-effects catalog: gedunk, kablam, kabong, pingy wobbles, wiggle bowang, zing. And catalogers work overtime to find the right adjectives: "searing harmonic slashes," "industrial amorphous textured presence," "incendiary fuzz mutations." Such descriptions convey little information, do not translate well to other languages, and are nearly useless for keyword searches.
Describing the source of a sound, if known, is far easier than describing the sound itself, and most catalogers resort to this approach. Most of us know the sounds of a "Honda Accord idling," "several coins dropped on a tile floor," and a "roller coaster passing by." Source descriptions are less useful if we are unfamiliar with the sounds, for example, "llama vocalizing," "slab of steel emerging from a furnace," and "water lock gates opening."
Although easier to describe, the source of a sound is of little interest to a sound designer who intends to use the sound for something else. In fact, knowing the source makes it harder to evaluate the sound. It is difficult to imagine that a cat can create the sound of a monster, but if you don't know that a sound came from a cat, you can listen to it objectively. Mott encourages sound designers to "disassociate the names of the sounds with the sounds themselves" and to "concentrate on the sound" and "ignore its source ." Legendary sound designer Ben Burtt makes it a practice to play sounds for the director without telling him their source so that he will listen to them without being influenced by their origin . Gary Rydstrom, another renowned designer, believes the most important talent for sound design is the ability to separate what a sound is from how it is made .
If the source of a sound is a synthesizer, then how should it be described? Consider a synthesizer sound used in a Star Trek movie to warn that the dylithium crystals are going to overload . "Weird electronic sound" and "dylithium crystal alarm" are clearly inadequate for retrieval purposes. A synthesizer can generate thousands of sounds that cannot meaningfully be expressed in words.
The limitations of searching for sounds by searching their text descriptions have inspired computer scientists to develop methods for content-based audio retrieval. In a "sounds-like search" or "query by sound example," a computer algorithm identifies the sounds in a collection that are most similar to an example or prototype sound. Recordings are retrieved based on how they sound, regardless of how or if they have been described in words. The example sound may be all or part of any recording. It may be an ad hoc recording of the user's voice or props mimicking a desired sound, or a recording that has been retrieved by a prior sounds-like search or keyword search.
The Comparisonics® "sound-matching" algorithm was developed in 1997. In the "indexing" step, digital audio data is analyzed by the algorithm and characterized by "signatures," where each signature is a vector of perceptual features encoded as a 16-byte quantity. In the comparison step, a signature is derived from the prototype and compared with the signatures computed for an indexed collection. For each indexed sound, a score is determined indicating the degree of similarity between the sound and the prototype, ranging from 0 (least similar) to 100 (most similar, i.e., identical). The sounds most like the prototype are displayed for the user in order of decreasing score, so that the best matches appear first in the list. The time required to compute the signature of a recording is less than one percent of the recording's playing time; therefore, sounds may be indexed in real time, as they are being recorded. In the comparison step, similarity scores can be computed for more than two million pairs of signatures per second.
This algorithm emulates the human perception of sound similarity. Computers lack ears and human intelligence, so it is a challenge to develop an algorithm that hears sounds like humans. Ultimately, humans are the judge of its accuracy. The Comparisonics algorithm is designed to work for all possible sounds and can compare recordings even if they differ in their duration, sample rate, file format, resolution, or compression.
Searching the Web for Sounds
FindSounds.com is a free Web site developed by Comparisonics Corporation where visitors can search the Web for sounds. It is a Web search engine like Google, but on a smaller scale and with a focus on sounds. Each month it processes more than one million sound searches for more than 100,000 unique visitors. Since its debut on August 1, 2000, it has processed more than 35 million sound searches. FindSounds.com appeals to the general Internet audience and is especially valuable to sound designers, musicians, filmmakers, videographers, animators, and game developers.
Like other Web search engines, queries are processed using a precomputed index of Web files. However, rather than indexing HTML pages or image files, the FindSounds index stores information about audio files. In response to a query, a list of "hits" provides links to audio files. Clicking on a link causes an audio file to be downloaded and played by an audio player program on the user's computer (e.g., Windows Media Player). Any file may be saved to the user's hard drive. Like any Web content, files may contain copyrighted material and it is the user's obligation to obtain copyright clearance if required for the intended use.
Keyword searches are performed by entering any word or phrase in a search box, or by clicking on one of the 500 "keyword links" that appear within categories on the Sound Types page. For example, clicking on the "elephant" link is a shortcut for typing "elephant" into the search box. The results of a keyword search for "bell" are shown in Figure 1 below. Up to 200 hits may be retrieved and are displayed ten to a page. Clicking on a URL or play icon downloads and plays a file. A short description of the sound appears in bold lettering below the URL, followed by the file size, number of channels, resolution, sample rate, and duration. Clicking on the "show page" link displays a Web page that refers to the file and may contain copyright information. The "e-mail this sound" link makes it easy to e-mail the file's URL.
Figure 1. List of hits for a keyword search at FindSounds.com.Notably, above each URL is a Comparisonics waveform display. This is an audio waveform display that has been color coded to convey the frequency content of the recording. Reds signify high frequencies, greens denote middle-to-high frequencies, blues represent low-to-middle frequencies, and dark colors indicate low (bass) frequencies. Similar sounds are mapped to similar colors, and changes in sound are seen as changes in color. This display serves as a "thumbnail" image providing information about the sounds in a file. Users learn to "read" the waveform, that is, they can get an impression of what a file will sound like simply by inspecting its waveform, which helps them to decide which files to download and play.
To the right of the play icon is the sounds-like search icon. Clicking on this icon launches a sounds-like search that utilizes the Comparisonics sound-matching algorithm to locate sounds on the Web that are similar to this sound. The 200 best matches are returned, ten to a page, in order of decreasing similarity to the prototype. The matches are determined based entirely on their audio characteristics, uninfluenced by file names and text descriptions. As a result, the sound of a revving engine may match a growling tiger, screeching tires may match a ranting chimpanzee, and a tympani roll may match a rumble of thunder. Such matches are of interest to sound designers but would never be discovered from text descriptions. A sounds-like search is a tool for browsing, exploring, and discovering sounds.
A "combined" search is both a sounds-like search and a keyword search. After performing a sounds-like search, the user can limit the display of matches to those that have been described using a particular keyword. For example, if the prototype is the sound of an engine, the user might choose to limit the display of matches to those labelled "engine." Creatively applied, a combined search can find coyote howls that sound like a siren and saxophone samples that resemble an elephant's bellow.
The FindSounds index is highly selective. It does not include speech or song recordings, although it does include non-speech utterances of the human voice (e.g., a grunt or scream) and samples of notes, chords, and beats that could be incorporated into a song. Because speech and song recordings are excluded, a keyword search for "elephant" returns only elephant sounds. By contrast, an indiscriminant indexing of audio files produces a list of hits in which elephant sounds are interspersed with recordings of people speaking about elephants and with songs about elephants (e.g., Henry Mancini's Baby Elephant Walk).
The FindSounds index is created by a semi-automated process. First, the FindSounds "spider" program finds audio files on the Web and downloads them for analysis. FindSounds.com is focused on short recordings, so files longer than 10 seconds are rejected. A file will also be rejected if it has an invalid format or unsupported compression, or is a poor-quality recording (i.e., is too quiet, has an excessive DC offset, or has a sample rate below 8kHz). The analysis automatically rejects about 90% of the files. The remaining 10% proceed to the auditioning phase in which a human listener rejects any file that contains at least one spoken word (to exclude speech recordings) and any file that contains a sequence of at least three different notes or chords (to exclude song recordings). Any file deemed obscene is also rejected (to make FindSounds.com safe for children to use). About 85% of the auditioned files are rejected.
Text descriptions cannot reliably be derived in an automatic way from audio file names or from text that surrounds links to audio files; therefore, accepted files go through a labelling process in which a human cataloger listens to each file and enters a description for it, if it is possible to do so. These descriptions appear in bold lettering in a list of hits and are used to answer keyword queries. However, many sounds defy description. About 58% of the files in the index are described in words; the remaining 42% are unlabelled, yet can be retrieved by a sounds-like search.
Automatic duplicate detection is an essential part of the indexing process. The FindSounds spider has located as many as 367 identical copies of a single recording. URLs of copies are saved in a database so that if one copy becomes inaccessible (i.e., the file goes offline), the index can be updated to refer to another copy. Users receive the URL of only one copy in a list of hits so they are not bothered by multiple hits for identical files.
Over its lifetime, the FindSounds spider has located about 10 million audio files on the Web and about 90% of these were rejected automatically. The remaining one million files, after duplicates are detected, represent about 600,000 different recordings. Of these, auditioners have accepted about 100,000 for inclusion in the FindSounds index. However, because files on the Web become inaccessible over time, the current number of indexed files is about 50,000.
Expanding the Search
FindSounds Palette is a software program introduced by Comparisonics Corporation in 2002 that extends the capabilities of FindSounds.com. It is an audio player, recorder, editor, database, search engine, and Web browser, all in one program. FindSounds Palette provides access to a palette of sounds stored locally and on the Web.
Users can catalog and search audio files stored on their local disks and local area network. A database named "MyPalette" stores information about local audio files. The user may enter the following metadata into MyPalette for each file: description, source, copyright, notes, genre, key, and tempo. In addition, each file may be placed in a class (Effect, Instrument, or Other) and in a category and sub-category. The main window of the program displays a hierarchical view of MyPalette files organized by class, category, and sub-category.
The FindSounds index is accessible from the program and is called "WebPalette." With one query, a user can search MyPalette and WebPalette to find local and remote files satisfying search criteria. Up to 200 MyPalette hits are returned in one list, and up to 200 WebPalette hits are retrieved in another. For each hit, icons are provided for playing the file, opening the file in the audio editor, and launching a sounds-like search using the file as the prototype. Once opened in the audio editor, a WebPalette file can be saved locally to MyPalette.
Sounds in MyPalette and WebPalette are located by keyword, sounds-like, and combined searches. For any search, the user may place restrictions on file format, file size, number of channels, resolution, sample rate, duration, key, and tempo. Keyword searches may apply to any combination of text fields: file name, description, source, copyright, notes, genre, category, and sub-category. The user may specify a desired range of similarity scores in a sounds-like search.
Users can search not only the sounds of local and remote files, but also sounds obtained by changing the speeds of these recordings. Each file in MyPalette may be indexed at its normal speed and 24 speed variations: the normal speed increased by one to 12 semitones (one octave) and decreased by one to 12 semitones. This has the effect of multiplying the size of the local audio collection, but without occupying additional disk space because each audio file is stored only once, at its normal speed. A collection of 10,000 local audio files thereby becomes a searchable database of 250,000 sounds. The sound that the user is seeking may already be on the user's hard drive but has yet to be heard by human ears.
Each WebPalette file is indexed at more than 40 speeds. The 50,000 sounds in the FindSounds index become a searchable collection of 2,000,000 sounds, which amounts to more than 1500 hours of audio. Users can find many interesting matches in this expanded collection. A speed variation is indicated in a list of hits by a number of semitones that is positive if the variation is faster than normal speed and negative if it is slower. The Comparisonics waveform display is colored to represent the sound of the speed variation, and when a user clicks on the play icon, the recording is played at the indicated speed.
In the audio editor, the user can play, record, and edit an audio file while viewing its Comparisonics waveform display. Editing operations include cut, copy, paste, mix, delete, fade, adjust volume, change speed, undo, and redo. In addition, metadata describing a MyPalette file may be entered and edited. The user may pan and zoom the waveform display; its colors help the user to "see" the sounds. The user may select any sound by highlighting it in the waveform display. Clicking on the sounds-like search icon retrieves sounds in MyPalette and WebPalette that are similar to the selected sound. A user can be recorded mimicking a desired sound and the recording can be edited or speed-changed to "fine tune" it before launching a search for similar sounds. When a local sound is used as the prototype in a WebPalette search, a signature is computed to characterize the sound, and it is the signature, not the voluminous audio data, that is communicated over the Internet to the FindSounds query processor.
In Figure 2a below is the Comparisonics waveform display of a recording of a whale that has been speeded up by four semitones. The first part of the recording has been selected (indicated by the black background) and is used as the prototype in a sounds-like search of WebPalette. Figure 2b shows a list of hits in order of decreasing similarity score. Because the hits sound similar to the prototype, their waveforms have similar colors. Each hit is a speed variation indicated by a positive or negative number of semitones. In this example, the prototype matched speed-altered recordings of whales, loons, a sparrow, a mosquito, human burps, radar beeps, a bell, a whimpering gorilla, a screaming toad, a Japanese wood flute, radio beacons, and the routing tone used by the Irish telephone system.
Figure 2a. Selecting a prototype sound in the Comparisonics waveform display.
Figure 2b. List of hits for a sounds-like search in FindSounds Palette.Future Directions
Computer technology has contributed to the "democratization" of multimedia production. Music composition and movie editing can be accomplished using personal computers and millions of people are embracing the opportunity. Creative people seek the best access to the most sounds. FindSounds.com and FindSounds Palette have succeeded in increasing the access to sounds; however, there is more that can be done.
Today there are countless hardware and software devices for electronically synthesizing and transforming sounds, offering limitless possibilities. However, these devices currently have no mechanism in place for searching the sounds they produce. A user explores the sounds of a synthesizer by the tedious process of setting parameters, playing a sound, changing the parameters, playing another sound, changing the parameters again, and so on. Wouldn't it be wonderful to perform a sounds-like search of the universe of sounds that a synthesizer can produce? The user could examine a list of hits, quickly audition any sound in the list, and obtain the parameter settings used to generate each sound. This concept could also be applied to manual sound-making devices (like the gadgets used on Foley stages) to discover sounds and the recipes for producing them.
Collections of audio recordings are untapped resources. With only meager access afforded by keyword searches, thousands of sounds remain hidden. Millions more sounds can be derived automatically from these collections (via speed change and other transformations), but are unsearchable without content-based retrieval.
In the year 1624, Sir Francis Bacon wrote New Atlantis in which he describes his vision of the future. We close with an excerpt that is prophetic.
© 2015 Comparisonics Corporation