Cross-lingual multiword expression identification and idiomaticity prediction using autoregressive and masked language models

dc.contributor.advisorCook, Paul
dc.contributor.authorHasan, Md. Arid
dc.date.accessioned2025-08-13T14:57:10Z
dc.date.available2025-08-13T14:57:10Z
dc.date.issued2025-05
dc.description.abstractToken-level multiword expression (MWE) identification and idiomaticity prediction remain major challenges in natural language processing, demanding sophisticated approaches to address non-compositional meanings and idiosyncratic syntactic behaviors. These tasks involve identifying idiomatic expressions at the level of individual tokens, allowing systems to distinguish figurative from literal usages. This thesis explores cross-lingual MWE identification using the PARSEME 1.2 shared task dataset and idiomaticity prediction on the SemEval 2022 Task 2 dataset, where models are evaluated on unseen languages. We employ larger multilingual masked language models (MLMs), e.g., XLM-R and mT5, than previous work [137], which used supervised fine-tuning, and larger autoregressive models, e.g., GPT-4o, which previous work on these tasks have not considered. We adopted supervised fine-tuning of MLMs and autoregressive models and applied a prompt-based approach to autoregressive models. Our findings indicate that larger MLMs do not outperform the Swaminathan and Cook [137] results for the SemEval and PARSEME tasks, but that supervised fine-tuning of autoregressive models does.
dc.description.copyright© Md. Arid Hasan, 2025
dc.format.extentviii, 94
dc.format.mediumelectronic
dc.identifier.urihttps://unbscholar.lib.unb.ca/handle/1882/38373
dc.language.isoen
dc.publisherUniversity of New Brunswick
dc.rightshttp://purl.org/coar/access_right/c_abf2
dc.subject.disciplineComputer Science
dc.titleCross-lingual multiword expression identification and idiomaticity prediction using autoregressive and masked language models
dc.typemaster thesis
oaire.license.conditionother
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of New Brunswick
thesis.degree.levelmasters
thesis.degree.nameM.C.S.

Files