UNB Libraries: Scholar Research Repository
  • Log In
    Communities & Collections
    Browse
  • What is UNB Scholar?Deposit to UNB ScholarUNB Scholar PolicyContact
  1. Home
  2. Browse by Author

Browsing by Author "Bear, Diego"

Now showing 1 - 1 of 1
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Item
    Leveraging bilingual dictionaries to learn word embeddings for low-resource languages
    (University of New Brunswick, 2025-02) Bear, Diego; Cook, Paul
    Word embeddings [33, 36] have been used to bolster the performance of natural language processing systems in a wide variety of tasks, including information retrieval [42] and machine translation [37]. However, approaches to learning word embeddings typically require large corpora of running text to learn high quality representations. For many languages, such resources are unavailable. This is the case for Wolastoqey and Mi’kmaq, two endangered low-resource Eastern Algonquian languages. As there exist no large corpora for Wolastoqey and Mi’kmaq, in this thesis, we leverage bilingual dictionaries to learn Wolastoqey and Mi’kmaq word embeddings by encoding their corresponding English definitions into vector representations using English word and sequence representation models. Specifically, we consider representations based on pretrained word2vec [33], RoBERTa [31], and sentence-RoBERTa [40] models, as well as, fine-tuned sentence-RoBERTa models [40]. We evaluate these embeddings in word prediction tasks focused on part-of-speech, animacy, and transitivity; semantic clustering; and reverse dictionary search. We additionally construct word embeddings for higher-resource languages — English, German and Spanish — using our methods and evaluate our embeddings on existing word-similarity datasets. Our findings indicate that our word embedding methods can be used to produce meaningful vector representations for low-resource languages such as Wolastoqey and Mi’kmaq and for higher-resource languages.
University of New Brunswick: established in 1785

General

  • Contact Us
  • Find Us
  • Library News
  • Hours
  • Policies

Libraries

  • Harriet Irving
  • Science & Forestry
  • Engineering & Computer Science
  • Hans W. Klohn Commons
  • Gerard V. La Forest Law

Departments

  • Archives & Special Collections
  • Centre for Digital Scholarship
  • Microforms
  • Government Documents, Data & Maps
  • … more

Join the conversation:

  • Facebook
  • Twitter
  • Instagram
  • Copyright
  • Privacy
  • Accessibility
  • Web Feedback
  • UNB Libraries
  • Ask Us
  • Feedback
  • Search