Automatic Correction of Spelling Errors

Authors

  • Олег [Oleg] Васильевич [V.] Бартеньев [Barten′ev ]
  • Дмитрий [Dmitriy] Алексеевич [A.] Титов [Titov]

DOI:

https://doi.org/10.24160/1993-6982-2019-5-117-128

Keywords:

database, automatic text correction, symmetric deletion algorithm, precedent, spelling error, grammatical error

Abstract

Information support and algorithms of a software application for searching and correcting spelling and grammatical errors in texts written in Russian are considered. The application’s operation is supported by a database the tables of which are filled using the morphological dictionary of the Russian language containing more than four million word forms and texts of different genres.

Before being subjected to correction, the text is divided into fragments; punctuation marks are used as text separators. The obtained text fragments are checked and corrected independently of each other.

The text is corrected in two stages. At the first stage, spelling errors and errors caused by incorrect word formation are corrected using the spelling correction method based on the symmetric deletion algorithm. For each word with an error, a list of candidates for replacement is formed at the first text correction stage. The candidate with the lowest replacement cost — an indicator characterizing the proximity of the word to be replaced and the candidate — is chosen as the replacing word. If there are several candidates with the equal replacement cost, preference is given to the candidate with the highest number of entries in the texts that were previously used to fill the application database. At the second text correction stage, certain types of grammatical errors are corrected. The correction is carried out on the basis of prece- dents — cases of using the “word — next word” pair in the texts that have undergone editorial correction. By using the precedents found in the database, the application highlights the words to be replaced. By analogy with the first text correction stage, the replacing word is chosen from the list of candidates, but the replacement will not be done if its cost exceeds the permissible value.

The text can be corrected both automatically and by interactively selecting a replacing word. In processing a test data set containing both spelling and grammatical errors, the application corrects more words than the Microsoft Word and Yandex-speller do.

Author Biographies

Олег [Oleg] Васильевич [V.] Бартеньев [Barten′ev ]

Ph.D. (Techn.), Assistant Professor of Applied Mathematics Dept., NRU MPEI, e-mail: mdf4@mail.ru

Дмитрий [Dmitriy] Алексеевич [A.] Титов [Titov]

Master Student of Applied Mathematics Dept., NRU MPEI, e-mail: dimnrtyu@mail.ru

References

1. Уроки литературы. Все виды ошибок [Электрон. ресурс] http://chitaj.ucoz.net/index/vse_vidy_oshibok/0-99 (дата обращения 01.09.2018).
2. Уроки литературы. Грамматические ошибки [Электрон. ресурс] http://chitaj.ucoz.net/index/grammaticheskie_oshibki/0-97 (дата обращения 01.09.2018).
3. Левченко О.С., Тишина Т.Н. Готовимся к ГИА по русскому языку. 9 класс (пособие для учителя): комментарии, рекомендации, дидактические материалы. Омск: БОУ ДПО «ИРООО», 2009.
4. Спеллер [Электрон. ресурс] https://tech.yandex. ru/speller/ (дата обращения 01.09.2018).
5. Поддержка системы автоматической правки текста [Электрон. ресурс] http://100byte.ru/stdntswrks/sql/ sql.html (дата обращения 01.09.2018).
6. Титов Д.А. Методы автоматического поиска и исправления ошибок в предложении [Электрон. ресурс] http://100byte.ru/stdntswrks/spellCh/spellCh.html (дата обращения: 01.09.2018).
7. Garbe W. 1000x Faster Spelling Correction algorithm (2012). [Электрон. ресурс] https://medium.com/@ wolfgarbe/1000x-faster-spelling-correction-algorithm- 2012-8701fcd87a5f (дата обращения 01.09.2018).
8. Варшавский П.Р., Алехин Р.В. Метод поиска решений в интеллектуальных системах поддержки принятия решений на основе прецедентов // Information Models and Analyses. 2013. V. 2. No. 4. Pp 385—392.
9. Знаки препинания [Электрон. ресурс] https://dic.academic.ru/dic.nsf/ruwiki/28257 (дата обращения 01.09.2018).
10. Фонетический разбор слов. [Электрон. ресурс] http://phoneticonline.ru/ (дата обращения 01.09.2018).
11. Словарь языка Пушкина / Отв. ред. акад. АН СССР В.В. Виноградов. М.: Азбуковник, 2000.
12. Mikolov T., Sutskever I., Chen K., Corrado G., Dean J. Distributed Representations of Words and Phrases and their Compositionality // Proc Advances in Neural Information Proc. Syst. 2013. Pp. 3111—3119.
---
Для цитирования: Бартеньев О.В., Титов Д.А. Автоматическая правка ошибок правописания // Вестник МЭИ. 2019. № 5. С. 117—128. DOI: 10.24160/1993-6982-2019-5-117-128.
#
1. Uroki Literatury. Vse Vidy Oshibok [Elektron. Resurs] http://chitaj.ucoz.net/index/vse_vidy_oshibok/0-99 (Data Obrashcheniya 01.09.2018). (in Russian).
2. Uroki literatury. Grammaticheskie Oshibki [Elektron. Resurs] http://chitaj.ucoz.net/index/grammaticheskie_ oshibki/0-97 (Data Obrashcheniya 01.09.2018). (in Russian).
3. Levchenko O.S., Tishina T.N. Gotovimsya k GIA po Russkomu Yazyku. 9 klass (Posobie dlya Uchitelya): Kommentarii, Rekomendatsii, Didakticheskie Materialy. Omsk: BOU DPO «IROOO», 2009. (in Russian).
4. Speller [Elektron. Resurs] https://tech.yandex.ru/speller/ (Data Obrashcheniya 01.09.2018). (in Russian).
5. Podderzhka Sistemy Avtomaticheskoy Pravki Teksta [Elektron. Resurs] http://100byte.ru/stdntswrks/sql/sql. html (Data Obrashcheniya 01.09.2018). (in Russian).
6. Titov D.A. Metody Avtomaticheskogo Poiska i Ispravleniya Oshibok v Predlozhenii [Elektron. Resurs] http://100byte.ru/stdntswrks/spellCh/spellCh.html (Data Obrashcheniya: 01.09.2018). (in Russian).
7. Garbe W. 1000x Faster Spelling Correction algorithm (2012). [Elektron. resurs] https://medium.com/@ wolfgarbe/1000x-faster-spelling-correction-algorithm-2012-8701fcd87a5f (Data Obrashcheniya 01.09.2018).
8. Varshavskiy P.R., Alekhin R.V. Metod Poiska Resheniy v Intellektual'nyh Sistemah Podderzhki Prinyatiya Resheniy Na Osnove Pretsedentov. Information Models and Analyses. 2013;2;4:385—392. (in Russian).
9. Znaki Prepinaniya [Elektron. Resurs] https://dic. academic.ru/dic.nsf/ruwiki/28257 (Data Obrashcheniya 01.09.2018). (in Russian).
10. Foneticheskiy Razbor Slov. [Elektron. Resurs] http://phoneticonline.ru/ (Data Obrashcheniya 01.09.2018). (in Russian).
11. Slovar' Yazyka Pushkina. Otv. Red. Akad. AN SSSR V.V. Vinogradov. M.: Azbukovnik, 2000. (in Russian).
12. Mikolov T., Sutskever I., Chen K., Corrado G., Dean J. Distributed Representations of Words and Phrases and their Compositionality. Proc Advances in Neural Information Proc. Syst. 2013:3111—3119.
---
For citation: Barten′ev O.V., Titov D.A. Automatic Correction of Spelling Errors. Bulletin of MPEI. 2019;5:117—128. (in Russian). DOI: 10.24160/1993-6982-2019-5-117-128.

Published

2018-10-23

Issue

Section

Mathematical and Software Support of Computing Machines, Complexes and Computer (05.13.11)