Knowledge base: Adam Mickiewicz University

Settings and your account

Back

Algorithms for automatic grammatical error correction

Roman Grundkiewicz

Abstract

This thesis explores the problem of automated grammatical error correction (GEC) in texts written by non-native English speakers. Our main focus is the machine translation approach to GEC. To overcome the data sparsity problem, we have developed a method for the automatic extraction of potential errors from Wikipedia text edition histories, and created the largest publicly available error annotated corpus so far. We investigate the usefulness of automatic GEC-specific metrics on the basis of their correlation with human judgements by conducting the first large-scale human evaluation study of automated GEC systems. Our proposed phrase-based statistical machine translation (SMT) system achieved new state-of-the-art results on the CoNLL-2014 test data – a standard benchmark for GEC provided during the Conference on Natural Language Learning shared task in 2014. We have shown that parameter optimization towards the task-specific evaluation metric and new GEC-adapted dense features are crucial for building a reliable and effective SMT-based GEC system. We also examined two methods which incorporate discriminative components into the generative SMT log-linear model. In the case of the second method – the first reported application of sparse features to GEC – our results significantly improve over the previous state-of-the-art in the field.
Record ID
UAM9b634f27d6e64bd0b9d13351a0899b83
Diploma type
Doctor of Philosophy
Author
Roman Grundkiewicz Roman Grundkiewicz,, Undefined Affiliation
Title in Polish
Algorytmy automatycznej poprawy błędów językowych
Title in English
Algorithms for automatic grammatical error correction
Language
en English
Certifying Unit
Faculty of Mathematics and Computer Science (SNŚ/WMiI/FoMaCS)
Discipline
information science / (mathematics domain) / (physical sciences)
Scientific discipline (2.0)
6.2 computer and information sciences
Defense Date
14-03-2018
End date
14-03-2018
Supervisor
URL
http://hdl.handle.net/10593/22067 opening in a new tab
Keywords in English
grammatical error correction, statistical machine translation, optimization, sparse features
Abstract in English
This thesis explores the problem of automated grammatical error correction (GEC) in texts written by non-native English speakers. Our main focus is the machine translation approach to GEC. To overcome the data sparsity problem, we have developed a method for the automatic extraction of potential errors from Wikipedia text edition histories, and created the largest publicly available error annotated corpus so far. We investigate the usefulness of automatic GEC-specific metrics on the basis of their correlation with human judgements by conducting the first large-scale human evaluation study of automated GEC systems. Our proposed phrase-based statistical machine translation (SMT) system achieved new state-of-the-art results on the CoNLL-2014 test data – a standard benchmark for GEC provided during the Conference on Natural Language Learning shared task in 2014. We have shown that parameter optimization towards the task-specific evaluation metric and new GEC-adapted dense features are crucial for building a reliable and effective SMT-based GEC system. We also examined two methods which incorporate discriminative components into the generative SMT log-linear model. In the case of the second method – the first reported application of sparse features to GEC – our results significantly improve over the previous state-of-the-art in the field.

Uniform Resource Identifier
https://researchportal.amu.edu.pl/info/phd/UAM9b634f27d6e64bd0b9d13351a0899b83/

Back
Confirmation
Are you sure?
Report incorrect data on this page