Token-level identification of multiword expressions using pre-trained multilingual language models

dc.contributor.advisorCook, Paul
dc.contributor.authorSwaminathan, Raghuraman
dc.date.accessioned2024-02-22T18:19:10Z
dc.date.available2024-02-22T18:19:10Z
dc.date.issued2023-09
dc.description.abstractMultiword expressions (MWEs) are combinations of words where the meaning of the expression cannot be derived from its component words. MWEs are commonly used in different languages and are difficulty to identify. For different NLP tasks such as sentiment analysis and machine translation, it is important that language models automatically identify and classify these MWEs. While considerable work has been done in identifying and classifying MWEs, little work has been done in a cross-lingual setting. In this thesis, we consider novel cross-lingual settings for MWE identification and idiomaticity prediction in which systems are tested on languages that are unseen during training. We use multilingual models of BERT, specifically mBERT, RoBERTa and mDeBERTa. Our findings indicate that pre-trained multilingual language models are able to learn knowledge about MWEs and idiomaticity that is not language-specific. Moreover, we find that training data from other languages can be leveraged to give improvements over monolingual models.
dc.description.copyright© Raghuraman Swaminathan, 2023
dc.format.extentvii, 72
dc.format.mediumelectronic
dc.identifier.urihttps://unbscholar.lib.unb.ca/handle/1882/37717
dc.language.isoen
dc.publisherUniversity of New Brunswick
dc.rightshttp://purl.org/coar/access_right/c_abf2
dc.subject.disciplineComputer Science
dc.titleToken-level identification of multiword expressions using pre-trained multilingual language models
dc.typemaster thesis
oaire.license.conditionother
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of New Brunswick
thesis.degree.levelmasters
thesis.degree.nameM.C.S.

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Raghuraman Swaminathan - Thesis.pdf
Size:
1 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.13 KB
Format:
Item-specific license agreed upon to submission
Description: