Natural Language Morphology Representation
This thesis defines Lightweight Morphology, an alternative to stemming, which creates inflection and derivations from an input word using (1) a set of pattern matching rules producing morphological variants, (2) rules in Java and (3) an exception table to handle exceptions of a language. A language (LiteMorph) was developed to represent natural language morphology specifications for Lightweight Morphology. A French specification was created using LiteMorph, requiring 526 rules, 41 rule sets and 16,842 exception table words. A comparison between an exact query, stemming and Lightweight Morphology was performed. Using a differential recall measure on a collection of 533 documents (Hansard proceedings of the 36th parliament of Canada), we showed that Lightweight Morphology has, on average, 3.9 times more queries retrieving fewer irrelevant documents than stemming. The French version has, on average, 2.5 times more queries retrieving more relevant documents compared to stemming. Two new measures (reflexivity and transitivity) of morphological consistency were defined and tested. The English and French LangLMs have reflexivity scores around 0.9 and transitivity scores under 0.09.