A comparison of machine learning algorithms for zero-shot cross-lingual phishing detection

dc.contributor.advisorHakak, Saqib
dc.contributor.advisorCook, Paul
dc.contributor.authorStaples, Dakota
dc.date.accessioned2024-02-22T14:54:09Z
dc.date.available2024-02-22T14:54:09Z
dc.date.issued2023-08
dc.description.abstractPhishing is a major problem worldwide. Existing studies have focused mainly on detecting emails in one language (mostly English). However, detecting emails in multiple languages is challenging due to a lack of datasets. Without ample data from which to learn, the models cannot detect a benign email from a spam email accurately, resulting in false positives and negatives. This research aims to compare the performance of numerous machine learning models and transformers using zero-shot learning for multilingual phishing detection. In a zero-shot learning set-up, the model is trained on one language and tested on another. English, French, and Russian emails are used as the training and testing languages. My results show that, on average, XLM-Roberta performs the best out of all the tested models in terms of accuracy scoring 99% testing on English, 99% testing on French, and 95% testing on Russian.
dc.description.copyright© Dakota Staples, 2023
dc.format.extentx, 74
dc.format.mediumelectronic
dc.identifier.urihttps://unbscholar.lib.unb.ca/handle/1882/37716
dc.language.isoen
dc.publisherUniversity of New Brunswick
dc.relationUniversity of New Brunswick, Faculty of Computer Science
dc.rightshttp://purl.org/coar/access_right/c_abf2
dc.subject.disciplineComputer Science
dc.titleA comparison of machine learning algorithms for zero-shot cross-lingual phishing detection
dc.typemaster thesis
oaire.license.conditionother
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of New Brunswick
thesis.degree.levelmasters
thesis.degree.nameM.C.S.

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Dakota Staples - Thesis.pdf
Size:
751.37 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.13 KB
Format:
Item-specific license agreed upon to submission
Description: