Statistical Methods for Speech Recognition


Price: $ 55 Buy this book

This book reflects decades of important research on the mathematical foundations of speech recognition. It focuses on underlying statistical techniques such as hidden Markov models, decision trees, the expectation-maximization algorithm, information theoretic goodness criteria, maximum entropy probability estimation, parameter and data clustering, and smoothing of probability distributions. The author’s goal is to present these principles clearly in the simplest setting, to show the advantages of self-organization from real data, and to enable the reader to apply the techniques.


“For the first time, researchers in this field will have a book that will serve as the bible’ for many aspects of language and speech processing. Frankly, I can’t imagine a person working in this field not wanting to have a personal copy.” –Victor Zue, MIT Laboratory for Computer Science
Review By: Bob Carpenter (New York, NY) –
Thorough Overview of Stats and Algorithms for Speech Rec, December 12, 2001

This book provides a comprehensive introduction to the statistical models and algorithms used for speech recognition. Jelinek sets up the speech recognition problem in the traditional way as the decoding half of Shannon’s noisy channel model. While Jelinek glosses over signal processing, he provides an excellent overview of the symbolic stages of processing involved in speech recognition.

After a quick introduction, Jelinek digs into the statistics behind Hidden Markov Models (HMMs), the foundation of almost all of today’s speech recognizers. This is followed by chapters devoted to acoustic modeling (probability of acoustics given words) and language modeling (probability of a given sequence of words), and the algorithmic search induced by this model. There are also advanced chapters on fast match (widely used heuristics for pruning search), the Expectation-Maximization (EM) algorithm for training, and the use of decision trees, maximum entropy and backoff for language models. He covers several auxiliary topics including information theory and perplexity, the spelling to phoneme mapping, and the use of triphones for cross-phoneme modeling. Each chapter is a worthy introduction to an important topic.

This book does not presuppose much in the way of mathematical, computational, or linguistic background. A simple intro to probability and some experience with search problems would be of help, but isn’t necessary — you’ll learn a lot about these topics reading the book.

All in all, this is the best thorough introduction to speech recognition that you can find. Read it along with Manning and Schuetze’s “Foundations of Statistical Natural Language Processing” from the same series; there’s a little overlap in language modeling, but not much. You might want to start with the gentler book by Jurafsky and Martin, “Speech and Language Processing”, before tackling either Jelinek or Manning and Schuetze.

Buy this book

© 2019 Interactive Speech. All Rights Reserved.

Site by Sparta Web Solutions

Wordpress Theme Development and Website Design by Sparta Web Solutions