Abstract:
This research is aimed at developing a Setswana grammar checker for Setswana
declarative sentences using Long Short-Term Memory Recurrent neural networks
(LSTM-RNNs). The research was motivated by the fact that Setswana is recognized
as one of the under-resourced languages in the world and the language lacks Natural
language processing (NLP) tools such as grammar checkers; this delays the
language’s technological progress. A Setswana grammar checker is a pre-requisite to
the development of other Human Language (HTL) applications such as machine
translators and parsers that are necessary for the language to exist in the web, hence
contributing to the language’s technological progress or improvement. Various
techniques have been implemented to develop grammar checkers for different
languages. These techniques include the rule-based approach, but the downfall
associated with this technique is that it is language-specific and many rules have to
be developed to satisfy all the grammatical rules available in that specific language;
this may be tedious and time-consuming. Another technique is the syntax-based
approach, and the disadvantage associated with this approach is that it depends on
the availability of a language parser. This research implements the statistical-based
approach to grammar checking. The grammar checker in this research is developed
using Long Short-Term Memory Recurrent neural networks (LSTM-RNNs). The
advantage of this technique lies in the fact that it enables the development of
language-independent grammar checkers and the developer does not need to have
deep knowledge of the underlying grammar of the language they are working with.
The Setswana grammar checker was developed by the use of 1700 Setswana
sentences; 750 incorrect sentences and 750 correct sentences. The training module
had a Validation accuracy of 0.95, a Validation loss of 0.05, and a Training loss of 0.1.
The testing module had a testing accuracy of 0.96. and a testing loss of 0.06. Results
of this study indicate that Long Short-Term Memory Recurrent neural networks (LSTM RNNs) can extract the pattern or word order followed by Setswana sentences and use this information to determine the grammatical correctness of Setswana text as
compared to the rule and syntax-based grammar checking techniques.