An Effective Approach to Detect Label Noise
University of New Brunswick
With the increased usage of Internet of Things (IoT) devices in recent years, different Machine Learning (ML) methods have also developed dramatically for attack detection in this domain. However, ML models are vulnerable to various classes of adversarial attacks that aim to fool a model into making an incorrect prediction. For instance, label manipulation or label flipping is a type of adversarial attack in which the attacker attempts to manipulate the label of training data, thereby causing the trained model to be biased and/or with decreased performance. However, the number of samples that can be flipped in this type of attack can be limited, giving the attacker a limited target selection. Due to the importance of securing ML models against Adversarial Machine Learning (AML) attacks, particularly in the IoT domain, this thesis presents an extensive review of AML in IoT. Then, a classification of AML attacks is proposed based on the literature, creating a foundation for future research in this domain. Next, more specifically, this thesis investigates the negative impact levels of applying malicious label flipping attacks (intentional label noise) on IoT data. As accurate labels are necessary for ML training, exploring adversarial label noise is an important research topic. However, the label noise in datasets is not always adversarial and may be caused due to several other reasons, such as careless data labelling. Classification is an essential task in machine learning, where the main objective is to predict the categories of unseen data. The existence of label noise in training datasets can negatively impact the performance of supervised classification, whether it is adversarial or non-adversarial. Due to the growing interest in the data-centric AI that aims at improving the quality of training data without enhancing the complexity of models, a range of research has been undertaken to tackle the label noise problem. However, few works have investigated this problem in the IoT network intrusion detection domain. This thesis addresses the issue of label noise in the intrusion detection domain by presenting a framework to detect samples with noisy labels. The proposed framework’s main components are the decision tree classification algorithm and active learning. The framework is composed of two steps: making a decision tree robust against the label noise in a dataset and then using this robust model with the help of active learning with uncertainty sampling to detect noisy samples effectively. In this way, the inherent resiliency of the decision tree algorithm against label noise is utilized to tackle this issue in datasets. Based on the results of our experiments, the proposed framework can detect a considerable number of noisy samples in the training dataset, with up to 98% noise reduction. The proposed detection method can also be leveraged as a defense against random label flipping attacks where adversarial label manipulation is applied randomly.