That ain’t how I speak: Personalizing natural language processing
University of New Brunswick
Natural language processing (NLP) involves automatically analyzing text written by human authors. People develop their own use of a language known as an idiolect, which could result in poor performance from generic NLP systems. Ideally, each person would have their own personalized system that is tailored toward them. In this thesis, I demonstrate the potential benefits of personalizing systems in three different NLP tasks, which include language modeling (estimating the probability of a sequence of words), authorship verification (determining if a document belongs to a specific person), and word sense disambiguation (assigning a dictionary-like meaning to a word in context). Personalization in these topics has not been widely studied and to the best of my knowledge, this is the first work to consider personalization with word sense disambiguation, for which I design a novel dataset. For each task, I show the increase in performance that the proposed personalized models have against state-of-the-art models. The experiments in this thesis are designed without consideration of people’s demographic and all personalized methods require relatively low amounts of text from an individual. These two criteria are respected to ensure the personalized methods work well for each individual regardless of their demographic or the amount of text they have authored.