Determining if this word is used like that word: predicting usage similarity with supervised and unsupervised approaches

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
University of New Brunswick
Determining the meaning of a word in context is an important task for a variety of natural language processing applications such as translating between languages, summarizing paragraphs, and phrase completion. Usage similarity (USim) is an approach to describe the meaning of a word in context that does not rely on a sense inventory -- a set of dictionary-like definitions. Instead, pairs of usages of a target word are rated in terms of their similarity on a scale. In this thesis, we evaluate unsupervised approaches to USim based on embeddings for words, contexts, and sentences, and achieve state-of-the-art results over two USim datasets. We further consider supervised approaches to USim, and find that they can increase the performance of our models. We look into a more detailed evaluation, observing the performance on different parts-of-speech as well as the change in performance when using different features. Our models also do competitively well in two word sense induction tasks, which involve clustering instances of a word based on the meaning of the word in context.