Achieving more effective fraud detection

Loading...
Thumbnail Image

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

University of New Brunswick

Abstract

Nowadays, most financial transactions are virtual all over the world. The rapid usage of credit cards and transnational online applications raises fraudulent activities using these services. So, fraud detection is one of the challenging real-world problems. One of the main challenges in fraud detection is imbalanced datasets, where there are very few cases of fraud and a massive amount of non-fraud samples. Also, the behavior of fraud changes frequently, making the learning process for the state-of-the-art machine learning binary classifiers complicated. As a result, in this thesis, we propose two effective frameworks for fraud detection to deal with this challenge. Our first framework consists of a novel preprocessing and subsampling step, which is followed by applying deep support vector data description for fraud detection. In our second framework, we introduce two versions of an ensemble of one-class classifiers. We utilize the Bootstrapping technique to create different training datasets for various weak learners to form a more robust model in the Bagging version. In our Stacking version, we divide the training dataset into two folds. We train the weak learners on the first fold. Then, we add their predictions on the remaining part of the training dataset to the second fold. Finally, the meta learner is trained on the second fold to make the final prediction. These two steps form a more robust model to deal with the imbalanced problem. Furthermore, we provide a trend analysis based on the size of the training, test datasets, and performance of the model using Area Under the Receiver Operating Characteristic Curve (ROC-AUC), Average Precision (AP), and F1 measures as metrics based on a real-world dataset. Also, we evaluate our frameworks on a publicly available synthetic datasets to measure their performance in a complex situation. Finally, based on the results, our both approaches outperform SVM and Random Forest as the state-of-the-art binary classifiers in different scenarios. They achieve remarkable performance in terms of AP, ROC-AUC, and F1 measures equal to 90%, 93%, and 85% (Best results), respectively.

Description

Keywords

Citation