Fisiologia del sistema uditivo en

Da "Fisica, onde Musica": un sito web su fisica delle onde e del suono, acustica degli strumenti musicali, scale musicali, armonia e musica.

The physiological mechanisms upon which sound perception is based are still not completely understood. As with all human perceptual processes, sound perception is not really a faithful transformation of the physical characteristics of sound waves but rather an active process of interpretation and construction of pressure waves in terms of: 1) perceived qualities, such as pitch, loudness and timbre, or on a cognitive level, such as euphony or cacophony; and 2) meaning (e.g. spoken language) or emotional aspects, such as those connected to music. The active nature of this process is indicated by the existence and characteristics of acoustic illusions. For a better understanding of the basic mechanisms underlying this process, albeit within the limits imposed by a website such as this one, you must read:

You can also directly experience various perceptual phenomena on the pages Acoustic effects and illusions, Beats, Critical bands and Masking.

Physiology of pitch perception

With which physiological processes does the human ear attribute a precise pitch sensation to a sound? Before looking for an answer to this question on this page, we advise you (if you haven't done so already) to read the pages on:

Below we will illustrate two physiological models of pitch perception; both of which have a common and experimentally proven base. Sound perception occurs through the following phases:

  1. the pinnae receive sound waves and convey them along the auditory canal to the tympanum;
  2. the vibration of the tympanum is transmitted by a system of levers (the ossicular chain) to a membrane that covers an opening called the oval window, which is the doorway to the cochlea (a spiral-shaped, fluid-filled canal);
  3. the vibration of the membrane of the oval window is transmitted to the fluid in the cochlea in the form of pressure waves;
  4. the pressure waves are transferred by the cochlear fluid to the basilar membrane inside the cochlea;
  5. the hair-cells of the Organ of Corti laying on the basilar membrane are set in motion;
  6. the deflection of the hair-cells induces the production of “electric messages” (action potentials), which are sent to the central nervous system via the fibres of the auditory nerve;
  7. the auditory information is processed in the brain.

Von Helmholtz's positional theory

Hermann von Helmholtz believed that the mechanism for discriminating frequencies was positional: [1] sounds of different frequencies make different regions of the basilar membrane vibrate.

Membrana basilare e frequenze.png

In particular, high frequency sounds make the basilar membrane vibrate close to its origin (the oval window (image on the left) and lower frequency sounds make it vibrate at the apex while moving away from the stapes of the oval window towards the helicotrema. Helmholtz, an excellent physicist and physiologist, believed that the basilar membrane was composed of a bundle of transversal fibres with characteristics of elasticity and variable mass per unit of length, which progressed from the oval window towards the helicotrema.

Membrana arrotolata.png

According to Helmholtz's hypothesis, when these fibres are hit by a fluid pressure wave with a particular frequency, they behave like independent forced harmonic oscillators and resonate at specific frequencies. Recalling that the resonance frequency of a harmonic oscillator is:

f_{0}={\frac  {1}{2\pi }}\cdot {\sqrt  {{\frac  {k}{m}}}}

this depends on the k constant (which measures elasticity) of the material and the mass of the oscillator. According to Helmholtz, the morphological variations of the basilar membrane (which thickens as it moves from the oval window to the helicotrema, as illustrated in the image on the right) determine the different resonance frequencies for the different regions.

Inviluppo onda complessa.png

Although it has been substantially confirmed by further observations by Georg von Békésy, [2] Helmholtz's model is excessively simplified in its "positional theory". The Hungarian physicist (Nobel prize winner in 1961 for his studies of the physiology of the inner ear) was able to observe the movement of the basilar membrane with a refined system of electrodes. His observations can be summarised by stating that:

  1. a sound wave generates a complexly shaped "travelling wave" in the basilar membrane that rapidly decays;
  2. the envelope (image on the right) of this travelling wave has a peak that falls in various regions of the basilar membrane according to the frequencies and positions predicted by Helmholtz;
  3. in the case of composite sounds, the ear can execute a sort of spectral analysis; i.e. it can discriminate between different wave components that make distinct parts of the basilar membrane vibrate;
  4. the preceding figure, in which we see the greatest areas of deformation of the membrane depending on the frequency, shows that the space reserved for an octave (i.e. for each doubling of frequency) is generally constant. When frequency is increased, the ratio between the amplitude of the involved region and the "range" of frequency to be perceived is drastically reduced. This will have notable consequences on the ability of our ear to discriminate two superimposed high frequency sounds.

Successes and limits of the positional theory

Many observed perceptual phenomena are explained by the positional theory:

the link between pitch perception and the morphology of the basilar membrane explains the existence of a range of audible frequencies. The characteristics of elasticity and mass per unit of length of the membrane determine the "resonating" and, therefore, audible frequencies.
  • the perception of musical intervals as ratios of frequencies (and not as differences)
if the amplitude of the region reserved for each octave is constant, we would expect that they are perceived as identical intervals having the same distance inside the basilar membrane. This explanation was certainly simplistic and would be refined by the periodicity theory.
  • the phenomenon of critical bands
is a perceptual phenomenon, illustrated on the page about critical bands. This phenomenon occurs while listening simultaneously to two sounds having very close frequencies. Basically, according to the positional theory, the "perceptual confusion" into which the auditory system falls in these circumstances can be attributed to sounds close in frequency that activate partially overlapping regions of the basilar membrane and, therefore, can send non-correlated nerve impulses to the same fibres.
  • the widening of the critical band at high frequencies
if, as we said, the amplitude of the region of the basilar membrane decreases per octave of increasing frequency, it is clear that the superposition of the regions of the basilar membrane, mentioned in the preceding point, is greater at high frequencies.
which is the missing (or weak) perception of one sound between two simultaneous sounds. Here, too, the explanation can be given in terms of the superpositioning of envelopes. Seeing how important the effects of masking are in music, we have dedicated an entire page to it.

The limits of the positional theory can be summarised by its inability to explain:

  • the fusion of a sound composed of exactly harmonic partials into a single a sound.
if the various harmonics of a composed sound activate various regions of the basilar membrane; then, why do we perceive a single sound?
if the fundamental harmonic is absent, how can we perceive virtual pitch in the absence of a specific stimulation of the corresponding region of the basilar membrane?
  • the perception of beats occurs separately in each ear, even when the two sounds are close in frequency:
obviously, in this case, one cannot attribute the perception (albeit weak) of beats to a beat phenomenon occurring in the basilar membrane itself (each membrane receives a single sound!). The blending of sounds and the perception of beats evidently acts on a processing level beyond that of the cochlea.
  • the great ability to discriminate the frequencies of two sounds that are not superimposed but received in very rapid succession.
how can the system be both selective in frequency and have a fast response at the same time? This is in contrast to the theory of resonance. To respond quickly, the oscillator (transversal fibre) must rapidly damp its oscillation in order to be ready to respond to the changed external stimulus (the second sound in succession); however a rapid damping means "broader resonances" (with regards to this, see the resonance curves presented on the pages Resonance and Impedance of a harmonic oscillator).
  • the results of several experiments (first and foremost, one using a modified Seebeck siren), in which pitch perception of periodic sounds is not determined by a fundamental harmonic.

Finally, there is a limit to the positional theory that we can define as constitutive: the theory is inevitably incomplete because it is still not entirely understood which movements of the basilar membrane are important for the stimulation of hair-cells. Furthermore, recent studies have shown that these hair-cells have the ability of autonomous contraction: this further complicates the understanding of how the mechanisms of the inner ear work.

The periodicity theory

The main assumption of the periodicity theory is that the sensation of sound pitch is also produced through a time analysis of sound and, in particular, of the frequency of repetition with which auditory nerve signals get to the brain. At first sight, the theory appears much more "natural" than the positional theory. What could be better for evaluating the frequency of a sound than directly counting the number of impulse in a unit of time, which the oscillation of the basilar membrane sends to the auditory nerve? This avoids the transformation of time information, such as frequency, into spatial information as happens in the positional theory.

This simplification must be avoided for several reasons:

  • the experimental observations of von Békésy unequivocally show the existence of the spatial correlation proposed by the positional theory;
  • the "naive" vision that auditory nerve signals are synchronised with the frequency of basilar membrane vibrations has been partially surpassed by experimental observation (see the studies of Ichiji Tasaki[3]) showing that individual nerve fibres, which are connected to the basilar membrane and set in motion by hair-cells, faithfully reproduce the period of the wave exciting them. In the figure below, we can see how sometimes individual nerve fibres are not excited even in the presence of a maximal deformation of the basilar membrane.

Impulsi fibra nervosa.png

However, the brain can process signals coming from a larger number of nerve fibres collected in a single auditory nerve (it is believed that the density of the receptors is about 100 receptors per millimetre of basilar membrane and that the area of the basilar region corresponding to an octave is about 4-5 millimetres) and recognise an almost periodical "pattern" of impulses (as seen in the last line of the figure on the left). It appears that there is a sophisticated statistical mechanism acting in the brain called temporal autocorrelation that can highlight the periodic characteristics of a pattern, and suppress all others, based on a comparison between the current pulse train and the preceding one. This would also explain "the readiness" to sense small variations of frequency between consecutive sounds. This temporal autocorrelation mechanism becomes more difficult at high frequencies; in those conditions, the increased density of the pulse trains makes the perception of the "pattern's" periodicity more confused.

  • the model of the periodicity theory is not an extension of the positional model, thus it should not be considered as a more general model and, therefore, as having a greater explicative power with respect to the positional model. Rather, it provides a description of an alternative strategy that the brain seems to use to collect maximum information from sound waves that perturb the perceptual system.

Successes and limits of the periodicity theory

Amongst the (partial) successes of the periodicity theory, we can include:

  • the explanation of the results of several experiments in which pitch perception of periodic sounds is not determined by a fundamental harmonic but actually by a type of "sub-harmonic" having a frequency equal to half of the fundamental harmonic.

as an example of this type of experiment, we suggest the one proposed by John R. Pierce [4] Two pulse trains of equal frequency (e.g. 100 impulses per second) are sent to the ear, however in the case of the second train, they change signs alternatively (i.e. they correspond to an increase and decrease of air pressure).

Esperimento Pierce.png

If we reason according to the positional theory, the ear will carry out a spectral analysis of the sound and perceive the pitch of the fundamental. This fundamental will correspond to the lowest frequency approximation of the signal, as depicted by the sine waves in the picture. In case A, we should perceive a sound frequency of 100 Hz and in case B, a sound of 50 Hz (considering that the frequency of the first harmonic is halved).

If we reason according to the periodicity theory, we should perceive a frequency of 100 Hz in both cases (seeing that the basilar membrane receives 100 impulses per second).

At first sight, we find ourselves in a classic experimentum crucis that will decide which of the two alternative theories is valid. The final judge is the experiment itself! Its results are absolutely surprising and confirm the complementarity of the two theories more than their opposition:

  1. at low frequencies (100 impulses per second), the two sounds are perceived with the same pitch (proving the periodicity theory);
  2. at higher frequencies (200 impulses per second), sound B is perceived as lower than sound A (exactly an octave below as predicted by the positional theory).
  3. for intermediary frequencies, pitch perception of sound B is ambiguous. A conflict occurs between the evaluation set by the frequency of repetition of the impulses and that set by the region of maximum deformation of the basilar membrane.
  • A possible explicative mechanism of the phenomenon of perception of virtual pitch
The time analysis of the frequencies of two superimposed harmonics (e.g. the frequency 2f and 3f) can make the brain perceive the existence of a higher order of periodicity (and, therefore, lower frequency) with respect to that of individual harmonics.
This is sort of what happens (simplifying a lot) if we observe two lights that are turned on periodically with a period of two to three seconds, we can perceive that the event of "simultaneous turning on of lights" happens every six seconds. However, if instead of thinking in terms of period, we think in terms of frequency, the frequency of the event of "simultaneous turning on of lights" is the greatest common divisor of the frequencies of the individual turning on. This is what happens in the phenomenon of perception of virtual pitch.
  • the perception of beats of sounds received by both ears separately
as the result of a time analysis of the vibration pattern (i.e. wave shape) of sounds very near in frequency, even if they are received separately by both ears.

The limits of the periodicity theory rest

  • in its incompleteness
the positional theory also suffers from this defect. The two theories appear complementary and, when considered separately, cannot explain the full range of perceptual processes described;
  • in its inability to explain the process of sound "fusion" that happens in the perception of sounds composed of harmonic (or quasi-harmonic) partials
the analysis of periodicities of time patterns does not seem sufficient to justify that perceptual event. In a composite sound (composed of harmonic partials) many periodic events are present. In what way does that of the fundamental harmonic prevail in pitch perception?

Beyond the positional and periodicity theories. A possible conclusion

It is clear from the brief analysis of the proposed theoretical models that the complete explanation of the phenomena of pitch perception is still to come. In recent years, other models have been proposed (above all to explain the fusion of composite sounds composed of harmonic partials) based on the category theory of spoken language. In summary, it is believed that just as the brain learns to distinguish phonemes of spoken language and to not confuse them even in the presence of minimal differences of the wave shape, so does the perceptual system recognise (starting with the deformation of the basilar membrane) harmonic (or quasi-harmonic) configurations and tends to "hear them" even when these configurations are slightly imprecise or incomplete.

Let's clarify with a few examples.

  • We can clearly distinguish the words "Core" and "Gore" in spoken language. If we use a computer to generate a wave shape "halfway" between the initial C and G, we will perceive either a C or a G. The perceptual system, recognising that the modified wave shape resembles a C (or a G), chooses one of the two "categories" from its repertory of phonemes. The transition from the perception of the C (or the G) is sharp even if the wave shape is gradually modified.
  • In the same way, when a musician (who probably has a "categorical" perception of the most common pitches with respect to a non-professional listener) hears a singer hitting the wrong note (e.g. executing a note with a lower frequency), they tend to recognise a B flat in the note (i.e. the harmonic configuration closest to the singer's actual performance). Probably, a mechanism of this type allows the association of pitch sensations even to instruments that generate non-exactly-harmonic partials (e.g. timpani, bars, bells, etc.). For more on this, reread A dialogue: "On the perception of pitch of composite sounds".

If there are few elements available to the perceptual system for the recognition of harmonic configurations (as happens with instruments that generate almost pure sounds, such as the flute), pitch perception can be ambiguous. The sound of the flute is often perceived as an "'octave lower" than the actual note being played.

The mechanism of recognition of harmonic configurations can also provide a possible explanation for the phenomenon of virtual pitch. The complete configuration of the harmonic partials of A4 can be perceived as an incomplete configuration of the harmonics of A3 (one octave lower). The perceptual system, having recognised a "category", hurries to "complete it" causing us to perceive the virtual pitch of the missing fundamental.

It is clear that the theory of categories provides an interesting key to the physiology of perceptual phenomena. However, it leaves some questions unanswered:

  • How do we construct our repertory of known categories? Is it innate or acquired through learning?
With regards to this, it is useful to observe that Japanese adults cannot distinguish the phoneme "r" from the phoneme "l" in western languages. However, Japanese children can do it easily if taught. Is this or is this not the same for sound?
  • Why were harmonic configurations selected?
  • Is the perception of other aspects of sound (e.g. timbre) categorical?

In-depth study and links

  1. Hermann L. F. von Helmholtz, Die Lehre von den Tonempfindungen, als Physiologische Grundlage für die Theorie der Musik, Vieweg, Braunschweig, 1863. You can see a full reproduction of the original third edition in German here (1870) and read the splendid introduction translated into English here [1]
  2. Georg von Békésy, Experiments in Hearing, McGraw Hill, New York, 1960
  3. Ichiji Tasaki, "Nerve impulses in individual auditory nerve fibers of guinea pig", Journal of Neurophysiology 17, 97 (1954)
  4. John R. Pierce, The Science of Musical Sound, Scientific American Library, 1983

"Fisica, onde Musica": un sito web su fisica delle onde, acustica degli strumenti musicali, scale musicali, armonia e musica.

Creative Commons License

Valid XHTML 1.0 Transitional

Valid CSS!