A Fast Voice Command Recognition Algorithm Based on the Hidden Markov Model Stationary Distribution
DOI:
https://doi.org/10.24160/1993-6982-2018-5-65-72Keywords:
hidden Markov models, voice control, pattern recognition, forward algorithmAbstract
Over the last few decades Hidden Markov Models (HMM) have become a dominating technology in automatic speech recognition (ASR) systems. Contemporary HMM-based solutions use Gaussian mixture models (GMM) for modeling acoustic speech variability. ASR algorithms involving acoustic models constructed with the use of deep neural networks (DNN) outperform GMMs in recognizing large-vocabulary speech. However, these algorithms feature extremely high computation complexity, due to which they cannot be applied in voice control systems with moderate computational resources. An approach to developing an algorithm for recognizing isolated words with low computation complexity is considered. All components of the isolated word recognition engine are described. A sequence of quantized Mel-frequency cepstral coefficients (MFCC) is used as speech signal description features. A fast isolated words recognition algorithm constructed on the basis of a stationary distribution of the Hidden Markov model is described. The proposed algorithm is characterized by a linear complexity with respect to the observed sequence length and requires significantly less memory compared with algorithms on the basis of GMM or DNN models.The algorithm’s recognition performance is evaluated on TIMIT isolated words dataset and the base of Russian words that was set up by the authors. It has been demonstrated that the proposed algorithm shows recognition performance that is only slightly inferior to GMMs and superior to self-adjustment neural networks.
References
2. Hinton G. e. a. Deep Neural Networks for Acoustic Modeling in Speech Recognition // IEEE Signal Proc. 2012. V. 29. No. 6. Pp. 82—97.
3. Mohamed A., Dahl G.E., Hinton G. Acoustic Modeling Using Deep Belief Networks // IEEE Trans. Audio, Speech and Language Proc. 2012. V. 20. No. 1. Pp. 14—22.
4. Hinton G., Bengio Y., Le Cun Y. Deep Learning // Nature. 2015. V. 521. Pp. 436—444.
5. Вагин В.Н., Ганишев В.А. Кластеризация пользователей по голосу с помощью улучшенных самоорганизующихся растущих нейронных сетей // Программные продукты и системы. 2015. № 3. C. 136—141.
6. Mohammadi M., Sadegh Mohammadi H.R. Study of Speech Features Robustness for Speaker Verification Application in Noisy Environments // Proc. 8 th Intern. Symp. Telecommunications. 2016. Pp. 489—493.
7. Molau S., Pitz M., Schlüter R., Ney H. Computing Mel-frequency Cepstral Coefficients on the Power Spectrum // IEEE Intern. Conf. Acoustics, Speech and Signal Proc. 2001. V. 1. Pp. 73—76.
8. Paramonov P., Sutula N. Simplified Scoring Methods for HMM-based Speech Recognition // Soft Computing. 2016. V. 20. Pp. 3455—3460.
9. Bertsekas D., Tsitsiklis J. Introduction to Probability. Belmont: Athena Sci., 2008.
10. Рабинер Л.Р. Скрытые марковские модели и их применение в избранных приложениях при распознавании речи: обзор // Труды ин-та инженеров по электротехнике и радиоэлектронике. 1989. Т. 77. № 2. С. 86—120.
11. Ting H.-N., Yong B.-F., Mirhassani S.M. Self-adjustable Neural Network for Speech Recognition // Engineering Appl. Artificial Intelligence. 2013. V. 26. Pp. 2022—2027.
12. Murphy K.P. Machine Learning: a Probabilistic Perspective. Cambridge, Massachusetts, London: MIT Press, 2012.
---
Для цитирования: Парамонов П.А., Огнев И.В. Быстрый алгоритм распознавания голосовых команд на основе стационарного распределения скрытой марковской модели // Вестник МЭИ. 2018. № 5. С. 61. Dahl G.E., Yu D., Deng L., Acero A. Context dependent Pre-trained Deep Neural Networks for Large- vocabulary Speech Recognition. IEEE Trans. Audio, Speech and Language Proc. 2012;20;1:30—42.
2. Hinton G. e. a. Deep Neural Networks for Acoustic Modeling in Speech Recognition. IEEE Signal Proc. 2012;29;6:82—97.
3. Mohamed A., Dahl G.E., Hinton G. Acoustic Modeling Using Deep Belief Networks. IEEE Trans. Audio, Speech and Language Proc. 2012;20;1:14—22.
4. Hinton G., Bengio Y., Le Cun Y. Deep Learning. Nature. 2015;521:436—444.
5. Vagin V.N., Ganishev V.A. Klasterizatsiya Pol'zovateley po Golosu s Pomoshch'yu Uluchshennykh Samoorganizuyushchikhsya Rastushchikh Neyronnykh Setey. Programmnye Produkty i Sistemy. 2015;3:136—141. (in Russian).
6. Mohammadi M., Sadegh Mohammadi H.R. Study of Speech Features Robustness for Speaker Verification Application in Noisy Environments. Proc. 8 th Intern. Symp. Telecommunications. 2016:489—493.
7. Molau S., Pitz M., Schlüter R., Ney H. Computing Mel-frequency Cepstral Coefficients on the Power Spectrum. IEEE Intern. Conf. Acoustics, Speech and Signal Proc. 2001;1:73—76.
8. Paramonov P., Sutula N. Simplified Scoring Methods for HMM-based Speech Recognition. Soft Computing. 2016;20:3455—3460. 9. Bertsekas D., Tsitsiklis J. Introduction to Probability. Belmont: Athena Sci., 2008.
10. Rabiner L.R. Skrytye Markovskie Modeli i Ikh Primenenie v Izbrannykh Prilozheniyakh pri Raspoznavanii Rechi: Obzor. Trudy In-ta Inzhenerov po Elektrotekhnike i Radioelektronike. 1989;77;2:86—120. (in Russian).
11. Ting H.-N., Yong B.-F., Mirhassani S.M. Self-adjustable Neural Network for Speech Recognition. Engineering Appl. Artificial Intelligence. 2013;26:2022—2027.
12. Murphy K.P. Machine Learning: a Probabilistic Perspective. Cambridge, Massachusetts, London: MIT Press, 2012.
---
For citation: Paramonov Р.А., Ognev I.V. A Fast Voice Command Recognition Algorithm Based on the Hidden Markov Model Stationary Distribution. MPEI Vestnik. 2018;5:65—72. (in Russian). DOI: 10.24160/1993-6982-2018-5-65-72.

