Effects and illusions
Auditory perception is a complex phenomenon, in which the physical characteristics of received sound, the physiological of the ear and the neural activity of the brain interact in a subtle way. On these pages we wish to show how auditory perception should not be considered as a faithful image of received sound but rather a complex (and not very faithful) processing of it.
Here we distinguish between effects and illusions intending that effects are "non-faithful" perceptions due to physiological characteristics of the receiving apparatus, while illusions - similarly to the more famous optical illusions - are a product of the brain's direct action that does not limit itself to just registering sounds it receives but also interprets them.
The Shepard scale
The Shepard scale,[1] of which there exists many versions, is a classic example of an auditory illusion. Listen to the example below, which reproduces the scale as a glissando and try to keep track of the trend of the note you perceive: you will hear an apparently endless descending tone. Actually, as an analysis of the sonogram reveals, it is a periodic example, in which the higher harmonics are gradually re-introduced, while the lower ones gradually disappear. The brain links pitch perception more to the total descending trend and not to the fundamental of the sound.
"Shepard" effects are also used in music in the form of circle progressions (e.g. in Great Fantasia and Fugue in G minor BWV 542 for the organ by J. S. Bach) or other tricks.
- In the final part of the song Echoes by Pink Floyd, a male chorus glissando was cut into loops and mixed to give the illusion of a continuous rise; it emerges from a fade-out of a long repetition lead by an electric guitar;
- The album A Day at the Races by Queen offers the opposite of our example below; it opens and closes circularly with a short trio of instruments, in which we hear a Shepard scale executed by an electric guitar.
The missing fundamental
The example below has two paradoxical particularities. We hear the sequence C3 - D3 - E3 - F3 - G3 - G2 - G1 - G0 repeated twice. The initial C3 corresponds to a frequency of pure sound at 261 Hz, while the four final Gs in descending octaves correspond to about 384, 192, 96 and 48 Hz.
The two sequences seem to have a different timbre but the notes being played seem to be the same (i.e. seem to have the same pitch). This is the first surprise. We could think that they are two different instruments executing the same score. However, if you examine the sonogram, you will find that the only difference between the two sequences is that in the second the fundamental of all the sounds has been eliminated. This shows that the brain attributes the pitch of a sound based more on the ratio between the harmonics of the entire spectrum rather than on the fundamental alone. Pitch is attributed to the frequency corresponding to the fundamental even if it was not emitted (see the discussion in A dialogue: "On the perception of pitch of composite sounds").
note |
frequency (Hz) |
G3 |
384 |
G2 |
192 |
G1 |
96 |
G0 |
48 |
|
|
|
We had promised two effects in a single example. The second is actually implicit, although you can hear it by listening to the example through the speakers of your computer instead of through headphones. The surprising effect is this: nothing changes! You will continue to perceive and recognise the same sequence of notes. The small speakers of a computer cannot transmit sounds below 50-60 Hz, however, the last G at 48 Hz is still perceived at the same pitch. Therefore, we have involuntarily discovered a way to improve the performance of speakers; by exploiting the help that comes from our perceptual system.
A "non-artificial" example of a missing fundamental occurs in the perception of the intonation of the timpani. In this case, we also hear primarily a note corresponding to a harmonic that does not appear in the instrument's spectrum (see the real sound of the timpani).
Sound sense and musicale sense
The following simple example shows that our auditory system processes sound information in completely different ways in order to recognise a song (music sense) or an instrument (sound sense).
The information about a song is contained in its high-level structure, such as the note sequence, the melody, the harmony or the composition. All of these characteristics fall within its musical aspect and require a "linguistic" analysis of the perceived sound.
However, the information about an instrument regards the perception of timbre and is coded in the acoustic characteristics of each sound making up the song, which can be visualised, for example, in its spectogram.
"The Entertainer" by Scott Joplin |
|
|
original version |
The original order of the notes is reversed but the individual notes have not been changed |
|
|
you can recognise the instrument but not the song |
The original order of the notes is the same but each individual note is played backwards in time |
|
|
you can recognise the song but not the instrument |
Tartini's third sound
The third sound is a "ghost" sound that is sometimes perceived when two intense sounds (rich with harmonics) reach the ear simultaneously. It is rather common to obtain this sound on the violin when playing double notes on the first and second strings.[2] The effect actually takes on various forms because the third sound appears at frequencies equal to the sum of (integer multiples of) the fundamental frequencies, to their difference and in other combinations. In the first example below, the third sound can be interpreted as an effect of a missing fundamental (see the previous paragraph). In the example, we can hear the superposition of two triangular waves at an interval of a natural major third (see the page on the natural scale). Their fundamentals, at 512 to 640 Hz, are in a ratio of 5:4. This particular ratio creates a superposition of harmonics at regular intervals, whose greatest common divisor is 128 Hz (a sound that is one octave lower than the fundamental at 512 Hz).
In the second example below, we first present the sounds of the components at 512 and 640 Hz separately. Then we present them together. The third sound is only audible through headphones if the volume is high enough. In the image above, we see the spectrogram of the composite sound. The reference grid highlights the multiples of the missing fundamental at 128 Hz.
The third sound can also be used to control the natural intonation of an interval. In the following example, the frequency of the highest note shifts ±20 Hz. The sound difference also oscillates correspondingly. The perceived beat is minimum when the component notes are tuned to a natural major third.
While the missing fundamental is substantially an illusion, in some cases, the third sound can be interpreted as a physical effect. It is also produced in cases in which it is not possible to use a missing fundamental as an explanation. This fact has been interpreted as proof of the non-linear behaviour of the human ear. Briefly, if the ear does not behave linearly (this happens when the input signal is very intense), it can distort the signal. This means that the ear can add frequencies to the incoming signal that are not part of the signal itself. These frequencies are not illusory but exist physically inside the ear and are perceived in correspondence to the physical maxima of the cochlear pressure wave. In particular, there is a type of distortion, called intermodulation distortion, that is well known to people who work with radio transmissions, which produces frequency harmonics equal to the sum and difference between the frequencies of incoming sounds.
The McGurk effect
Observe the video on the right and try to guess what syllable is being pronounced. Then try listening to the audio without watching the video. You will hear something completely different. As with all perceptual illusions, this one conveys deliberately ambiguous or contradictory information to the brain and is adjusted by the brain in a way that is not simply the faithful image of reality. In this case, the video shows a man saying "ga-ga" three times, however, the audio actually recorded the sound "ba-ba" three times. Now, to pronounce the letter "b", we know that our lips have to touch, which is not happening in this video. The brain resolves the conflict by making us perceive an intermediary "da-da" sound, or, sometimes the visual version “ga-ga”. The inverse of this same effect partly explains why we are not very disturbed by the asynchrony between the lip movement and the audio in foreign films that have been dubbed into our mother tongue. We simply ignore the incomprehensible part of the message (the mouthing of the foreign language) in favour of the comprehensible parte (the dialogue in our mother tongue).
|
|
Other illusions ad libitum
Since the field of psychology of perception is too vast to be dealt with here and acoustic illusions are not as famous as optical illusions, we suggest you visit this website ([1]) belonging to a well known American professor, which contains both examples of acoustic illusions and an extensive bibliography.
In particular, we suggest:
-
The triton paradox: an interval exactly equal to a half octave. It is created by playing two Shepard sounds in a fixed ascending or descending succession. It is perceived ambiguously, even by expert musicians. Some perceive the interval as ascending, while others as descending. On this webpage, http://www.cs.ubc.ca/nest/imager/contributions/flinn/Illusions/TT/tt.html, you will find a Java applet that you can use to experiment quantitatively with this effect.
-
The mysterious melody: shows the importance of recognition (the opposite of pure recording of sounds). This demonstrates the well-known fact that it is much easier to follow along with a song if we know it. As well as the lesser known fact that spreading the notes of a known melody across different octaves makes the melody less familiar or even unrecognisable.
A splendid application of a similar "effect" can be found in Anton Webern's arrangement of the "Ricercar a 6" from J. S. Bach's Musikalisches Opfer. The famous theme is divided amongst various instruments and across several octaves. In particular, the first five notes are played on the trombone, then two on the horn in F, two on the trumpet in C, one on the horn again with a touch of harp, then four on the trombone (the first with the horn), three on the horn and the last two on the trumpet and harp together. Even though the composition is based on the same notes, it has become a completely new piece of music. Notice that all the brasses are being played with mutes, which make their differences in timbre much less accentuated and favours a smooth switch from one voice to another. (For acoustic examples on the mute, see the page on the trumpet).
The original theme given to Bach by King Frederick II of Prussia
|
|
Links
For a look at other important effects, see the following pages on:
Along with Diana Deutsch's website ([2]), the website of the Kyushu Institute of Design has various acoustic files demonstrating different effects.
- ↑ Roger N. Shepard, Journal of the Acoustical Society of America 36 2346 (1964), doi:10.1121/1.1919362
- ↑ It is for this reason that the discovery of this effect is attributed to Giuseppe Tartini (1692-1770); a violinist, teacher and composer particularly famous for his "Devil's Trill Sonata". He was also interested in acoustics and the consonance theory, on which he wrote various treatises. The third sound is described in his Trattato di musica secondo la vera scienza dell'armonia [Treatise on Music According to the True Science of Harmony], Padova, 1754.