Development and Testing of a Methodology for Evaluating the Effectiveness of Neural Network Models in Annotating Customer Helpline Calls
DOI:
https://doi.org/10.24160/1993-6982-2026-3-171-179Keywords:
automatic speech recognition, transcription, phone calls, transformers, conformers, frequently asked questionsAbstract
The automatic speech recognition (ASR) problem is considered in the context of telephone calls from users to the support service for further processing and analysis of transcriptions. The issue of choosing a suitable neural network ASR model with transformer and conformer architectures among popular open source ASR projects is investigated. A methodology for testing ASR models based on a typical set of Russian-language recordings of real calls with taking into account configurable hyperparameters of the models has been developed. By applying the proposed methodology, several widely used ASR models were compared according to the recognition accuracy and performance criteria. Based on the comparison results, a model for the target telephone support service has been selected, and two assumptions have been formulated about the existence of a maximum amount of training data, after which the recognition accuracy depends mainly on the number of trained parameters. An approach for pipeline postprocessing is proposed, which includes speaker diarization and automatic annotation using large language models for subsequently forming a database of frequently asked questions and other applied tasks.
References
1. Chorowski J. e. a. Attention-based Models for Speech Recognition // Proc. 28th Intern. Conf. Advances in Neural Information Proc. Syst. Montreal, 2015. Pp. 1—9.
2. Gulati A. e. a. Conformer: Convolution-augmented Transformer for Speech Recognition [Электрон. ресурс] https://arxiv.org/abs/2005.08100 (дата обращения 05.09.2025).
3. Whisper [Электрон. ресурс] https://github.com/openai/whisper (дата обращения 05.09.2025).
4. Faster Whisper Transcription with CTranslate2 [Электрон. ресурс] https://github.com/SYSTRAN/faster-whisper (дата обращения 05.09.2025).
5. Rekesh D. e. a. Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition [Электрон. ресурс] https://arxiv.org/abs/2305.05084 (дата обращения 05.09.2025).
6. GigaAM: the Family of Open-source Acoustic Models for Speech Processing [Электрон. ресурс] https://github.com/salute-developers/GigaAM/ (дата обращения 05.09.2025).
7. T-one [Электрон. ресурс] https://github.com/voicekit-team/T-one (дата обращения 05.09.2025).
8. Radford A. e. a. Robust Speech Recognition Via Large-scale Weak Supervision [Электрон. ресурс] https://arxiv.org/abs/2212.04356 (дата обращения 05.09.2025).
9. Silero Team. Silero VAD: Pre-trained Enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier [Электрон. ресурс] https://github.com/snakers4/silero-vad (дата обращения 05.09.2025).
10. NVIDIA FastConformer-hybrid Large (ru) [Электрон. ресурс] https://huggingface.co/nvidia/stt_ru_fastconformer_hybrid_large_pc (дата обращения 05.09.2025).
---
Для цитирования: Филатов С.А., Елисеев В.Л. Разработка и апробация методики оценки эффективности нейросетевых моделей при аннотировании обращений в телефонную службу поддержки пользователей // Вестник МЭИ. 2026. № 3. С. 171—179. DOI: 10.24160/1993-6982-2026-3-171-179
---
Конфликт интересов: авторы заявляют об отсутствии конфликта интересов
#
1. Chorowski J. e. a. Attention-based Models for Speech Recognition. Proc. 28th Intern. Conf. Advances in Neural Information Proc. Syst. Montreal, 2015:1—9.
2. Gulati A. e. a. Conformer: Convolution-augmented Transformer for Speech Recognition [Elektron. Resurs] https://arxiv.org/abs/2005.08100 (Data Obrashcheniya 05.09.2025).
3. Whisper [Elektron. Resurs] https://github.com/openai/whisper (Data Obrashcheniya 05.09.2025).
4. Faster Whisper Transcription with CTranslate2 [Elektron. Resurs] https://github.com/SYSTRAN/faster-whisper (Data Obrashcheniya 05.09.2025).
5. Rekesh D. e. a. Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition [Elektron. Resurs] https://arxiv.org/abs/2305.05084 (Data Obrashcheniya 05.09.2025).
6. GigaAM: the Family of Open-source Acoustic Models for Speech Processing [Elektron. Resurs] https://github.com/salute-developers/GigaAM/ (Data Obrashcheniya 05.09.2025).
7. T-one [Elektron. Resurs] https://github.com/voicekit-team/T-one (Data Obrashcheniya 05.09.2025).
8. Radford A. e. a. Robust Speech Recognition Via Large-scale Weak Supervision [Elektron. Resurs] https://arxiv.org/abs/2212.04356 (Data Obrashcheniya 05.09.2025).
9. Silero Team. Silero VAD: Pre-trained Enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier [Elektron. Resurs] https://github.com/snakers4/silero-vad (Data Obrashcheniya 05.09.2025).
10. NVIDIA FastConformer-hybrid Large (ru) [Elektron. Resurs] https://huggingface.co/nvidia/stt_ru_fastconformer_hybrid_large_pc (Data Obrashcheniya 05.09.2025)
---
For citation: Filatov S.A., Eliseev V.L. Development and Testing of a Methodology for Evaluating the Effectiveness of Neural Network Models in Annotating Customer Helpline Calls. Bulletin of MPEI. 2026;3:171—179. (in Russian). DOI: 10.24160/1993-6982-2026-3-171-179
---
Conflict of interests: the authors declare no conflict of interest

