K-spectrum Support Vector Machine classifier for spam filtering

dc.contributor.advisorMahanti, Prabhat
dc.contributor.advisorKim, Dongmin
dc.contributor.authorYang, Ming
dc.date.accessioned2023-03-01T16:39:44Z
dc.date.available2023-03-01T16:39:44Z
dc.date.issued2013
dc.date.updated2016-12-14T00:00:00Z
dc.description.abstractTraditionally machine learning approaches including Support Vector Machine (SVM) for spam filtering use the bag of words text representation technique to represent its features. However, this technique does not take the word order information into account and is not suitable for languages that do not use white spaces as word delimiters. Therefore, it is appealing to treat every email as a string of symbols by using a string-based approach. In this report, we implement a contiguous string-based approach, which is called k-spectrum kernel, for use with SVM in a discriminative approach to the spam classification problem. When using the k-spectrum SVM spam classifier, email texts are implicitly mapped into a high-dimensional feature space. The classifier produces a decision boundary in this feature space, and emails are classified based on whether they map to the positive (spam) or negative side (non-spam) of the boundary. Our experimental results demonstrate that the k-spectrum SVM spam classifier could offer an effective and accurate alternative to other approaches of spam filtering, such as generally used approaches including Naive Baysian and SVM classifier that is based Bag-of-Words (BOW).
dc.description.copyright© Ming Yang, 2013
dc.description.noteA Report Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Computer Science. Electronic Only. (UNB thesis number) Thesis 9207. (OCoLC) 960951414
dc.formattext/xml
dc.format.extentix, 61 pages
dc.format.mediumelectronic
dc.identifier.oclc(OCoLC) 960951414
dc.identifier.otherThesis 9207
dc.identifier.urihttps://unbscholar.lib.unb.ca/handle/1882/14288
dc.language.isoen_CA
dc.publisherUniversity of New Brunswick
dc.rightshttp://purl.org/coar/access_right/c_abf2
dc.subject.disciplineComputer Science
dc.subject.lcshSupport vector machines
dc.subject.lcshSupervised learning (Machine learning)
dc.subject.lcshKernel functions
dc.subject.lcshSpam filtering (Electronic mail)
dc.titleK-spectrum Support Vector Machine classifier for spam filtering
dc.typemaster thesis
thesis.degree.disciplineComputer Science
thesis.degree.fullnameMaster of Computer Science
thesis.degree.grantorUniversity of New Brunswick
thesis.degree.levelmasters
thesis.degree.nameM.C.S.

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
item.pdf
Size:
1.06 MB
Format:
Adobe Portable Document Format