Speech Classification Based on Vocal Tract Excitation

Speech is classified into three broad categories depending upon the nature of excitation of the vocal tract; voiced, unvoiced and plosive sounds.


Voiced Sounds

If excitation of vocal tract originate at glottis and is by periodic vibration of the vocal cord we have voiced speech like ‘oh’, ‘eh’. Because of the periodic vibration the frequency spectrum of sound is rich in harmonics at multiples of the fundamental frequency or pitch. The air flow wavefrom resulting from the vibrations is approximately triangular therefore harmonics decay in amplitude at a rate of approximately12 dB/octave. Vocal tract act as a resonator to amplify some of these harmonics and attenuates other to produce voiced sounds. The pitch is controlled by the tension in the vocal cords and the air pressure from the lungs. Typical pitch values for children are around 1000 Hz and for adults lie in the range 50-500 Hz.


Unvoiced Sounds

The vocal cords do not vibrate in the production of unvoiced sound. The excitation is due to turbulence of airflow past a narrow constriction and tends to be random in nature, with a flat, continuous spectrum.

  • The sound is called aspirated if the constriction is at glottis, for example ‘h’ .
  • If the constriction is along some point of the vocal tract the sound is called fricative, for example ‘s’ or ‘f’ .


Mixed Excitation

Mixed excitation is also possible for the class of sounds known as voiced fricatives such as ‘v’ in ‘vote’ and ‘z’ in ‘zoo’ . In voiced fricatives turbulent excitation is amplitude modulated periodically by vibration of the vocal tract. Variations in velum and nasal cavity are associated with characteristics in the spectrum of nasalized speech sound. Anatomical variations in configuration of speech in the structure of palate are associated with a typical speech sound such as lips or abnormal nasality.

