UNB Scholar Research Repository :: Browsing by Author "Ghorbani, Ali"

Browsing by Author "Ghorbani, Ali"

Now showing 1 - 20 of 38

A behavioral based detection approach for business email compromises
(University of New Brunswick, 2019) Maleki, Nasim; Ghorbani, Ali
The most recent infectious vector in email attacks is Business Email Compromise (BEC), which is an entry point for attackers to get access to an enterprise network and obtain valuable company data. According to the Symantec Internet Threat Security Report (ISTR), around 7,710 organizations are hit by a Business Email Compromise attack every month. A BEC is a type of phishing attack that criminals impersonate a person of authority in an organization (CEO) through spoofing or take-over accounts. Since spoofing techniques are detectable using SPF, DMARC, and DKIM, we proposed and implemented a behavioral-based framework for the detection of BEC when accounts or machines are compromised. This framework stops malicious emails on the sender-side because the lack of enough email of the sender on the receiver-side cannot result in a representative user-profile. Moreover, a compromised account or machine turns into a devastating weapon targeting many people. Hence it ought to be stopped from the sender-side, and the real owner should be notified of this disaster. Our framework in the experiment on Enron Dataset for all users has reached a total average of 92% and 93% for Accuracy and F1 score, respectively.
A deep learning based framework for detecting and visualizing online malicious advertisement
(University of New Brunswick, 2018) Zhang, Xichen; Ghorbani, Ali
A detection framework for android financial malware
(University of New Brunswick, 2019) Abdul Kadir, Andi Fitriah; Ghorbani, Ali; Stakhanova, Natalia
As attempts to thwart cybercrime have intensified, so have innovations in how cybercriminals provision their infrastructure to sustain their activities. Consequently, what motivates cybercriminals today has changed: from ego and status, to interest and money; the cybersecurity researchers today have turned their attention to financial-related malware, especially on the Android platform. There are some major issues faced by researchers in detecting Android financial malware. Firstly, what constitutes Android financial malware is still ambiguous. There is a disparity in labelling the type of malware where most of the current detection systems emphasize the recognition of generic malware types (e.g., Trojan, Worm) rather than indicating its capabilities (e.g., banking malware, ransomware). Without knowing what constitutes financial malware, the detection systems are not capable of providing an accurate recognition of an advanced and sophisticated financial-related malware. Secondly, most of the current anomaly-based detection systems via machine learning suffer from inaccurate evaluation and comparison due to the lack of adequate datasets, which result in unreliable outputs for real-world deployment. Due to time consuming processes, most of the available datasets are crafted mainly for static analysis, and those created for dynamic analysis are installed on an emulator or sandbox. Sophisticated malware can bypass these approaches; malware authors have employed obfuscation methods and included a wide range of anti-emulator techniques, where the malware programs attempt to hide their malicious activities by detecting the emulator. These deficiencies are some of the major reasons why Android financial malware is able to avoid detection. A comprehensive understanding of the existing Android financial malware attacks supported by a unified terminology and high-quality dataset is required for the deployment of reliable defence mechanisms against these attacks. Therefore, we seek to understand trends and relationships between Android malware families and devise a taxonomy of Android financial malware attacks. In addition, a systematic approach to generate the required datasets is presented to address the need to use physical platforms instead of emulators. In this regard, an automated dynamic analysis system running on smartphones is developed to generate the desired dataset in a testbed environment. In order to correlate the generated dataset and the proposed taxonomy, a hybrid framework for malware detection is presented. We propose a novel combination of both static and dynamic analysis based specifically on features derived from the string literal (statically via reverse engineering) and network flow (dynamically on smartphones). This combination can assist security analysts in recognizing the threats effectively. We employ five common classifiers to construct the best model to identify malware at four levels: detecting malicious Android apps, classifying Android apps with respect to malware category and sub-category, and characterizing Android apps according to malware family. Specifically, a dataset containing over 5,000 samples is used to evaluate the performance of the proposed method. The experimental results show that the proposed method with a Random Forest classifier achieves an accuracy of over 90% with a very low false positive rate of 4% on average.
A dynamic graph-based malware classifier
(University of New Brunswick, 2016) Jazi, Hossein Hadian; Ghorbani, Ali
The anti-virus industry receives a sheer amount of new malware samples on a daily basis. The prevalence of new sophisticated instances, for most of which no signature is available, coupled with the significant growth of potentially harmful programs have made the adoption of an effective automated classifier almost inevitable. Due to the vast majority of obfuscation techniques employed by the malware authors, extraction of a high-level representation of malware structure is an efficient way in this regard. High-level graph representations such as Function Call Graphs or Control Flow Graphs are able to represent the main functionality of a given sample in more abstract way. The graph-based approaches have mostly revolved around static analysis of the binary and share the common drawbacks of any static-based approaches. For example, generating a graph from a packed executable does not reflect the real structure of the code at all. In addition to the type of analysis, the scalability of these approaches is also affected by the employed graph comparison algorithm. Full graph comparison is by itself an NP-hard problem. Approximated graph comparison algorithms such as Graph Edit Distance have been commonly studied in the field of graph classification. To address the two major weaknesses involved with the current graph-based approaches, we propose a dynamic and scalable graph-based malware classifier. At the time of this proposal, this is the first attempt to generate and classify dynamic graphs. In spite of providing more accurate graphs, dynamic analysis leads to the generating larger graphs, and aggravating the problem of comparison measurement. To address this problem we modify an existing algorithm called Simulated Annealing to reduce computational complexity. To have a reasonable estimation of the effectiveness, our proposed system is compared against Classy, which is the state-of-the-art graph-based system. Our results show that proposed classifier is able to outperform Classy by an average classification accuracy of 94%, 4% false positive rate, and leaving only 2% of samples unlabeled.
A genetic-algorithm-based solution for HTTP-based malware signature generation
(University of New Brunswick, 2014) Pourafshar, Amir; Ghorbani, Ali
The rise in prevalence of malwares has become the most serious threat to Internet security. In order to minimize the devastating impact of this threat many malware detection strategies and systems have been developed, in recent years. This thesis presents a novel malware signature generation and evolution system to detect never-before-seen malwares. We focus on automatic generation of evolved signatures for HTTP-based malware traces based on features and the structure of currently known malwares. The idea is that we can evolve signatures of known malwares to predict the structure of future malware traces since they usually inherit some of their characteristics and structure from their predecessors. We implemented a proof-of-concept version of our proposed evolutionary signature generation system. Datasets of malicious and legitimate network traffic have been used to evaluate the proposed system. Results from performed experiments show the system's ability in detecting an acceptable portion of new, unknown malware samples while maintaining a low false alarm rate. Using the base and evolved signatures together increased the average detection rate of the unknown malicious traces from 38:4% to 50:8%. This improvement happens while the average false positive rate of the evolved signature sets is 2:7 * 10‾³.
A Phishing e-mail detection approach using machine learning techniques
(University of New Brunswick, 2017) Mbah, Kenneth Fon; Ghorbani, Ali
According to APWG reports of 2014 and 2015, the number of unique Phishing e-mail reports received from consumers has increased tremendously from 68270 e-mails in October 2014 to 106421 e-mails in September 2015. This significant increase is a proof of the existence of Phishing attacks and the high rate of damages they have caused to Internet users in the past. Because no attention is made in the literature to specifically detect Phishing e-mails related to advertisement and pornographic, attackers are becoming extremely intelligent to use these means of attraction to track users and adjusting their attacks base on users behaviours and hot topics extracted from community news and journals. We focus on detecting deceptive e-mail which is a form of Phishing attacks by proposing a novel framework to accurately identify not only e-mail Phishing attacks but also advertisements or pornographic e-mails consider as attracting ways to launch Phishing. Our approach known as Phishing Alerting System (PHAS) has the ability to detect and alert all type of deceptive e-mails so as to help users in decision making. We are using a well known e-mail dataset and base on our extracted features we are able to obtain about 93.11% accuracy while using machine learning techniques such as J48 Decision Tree and KNN. Furthermore, we equally evaluate our system built based on these above features and obtained approximately the same accuracy while using the same dataset as input to our system.
A sarcasm detection framework in Twitter and blog posts based on varied range of feature sets
(University of New Brunswick, 2016) Minaee, Hamed; Ghorbani, Ali
This thesis addresses the problem of sarcasm detection by using a framework which is designed to effectively detect sarcastic blog and microblog posts. This framework consists of two components. Each component consists of different sub components including crawler, preprocessing and classification. The long text sarcasm detection classification consists of a two-step process, in each step, we use some feature sets along with different classifiers. These feature sets are utilized to analyze each blog post as a whole in addition to every isolated sentence. In the first step, Scoring Component is used to classify the documents into groups of sarcastic and non-sarcastic. Also in order to find sarcastic sentences in each sarcastic document, Decision Tree is applied. Considering the difficulties in sarcasm detection, the Document Level Sarcasm Detection achieved an outstanding result: 75.7% Precision rate. In the Short Text, Decision Tree is applied in order to classify the tweet texts into groups of sarcastic and non-sarcastic. Precision of 86.6% is obtained for this component which is very good considering the difficulty of sarcasm detection as well as inherent complexity of Twitter texts.
A sentiment analysis framework for social issues
(University of New Brunswick, 2015) Karamibekr, Mostafa; Ghorbani, Ali
Sentiment analysis investigates attitudes, feelings, and expressed opinions regarding products, services, topics, or issues. Subjectivity classification that categorizes text as objective or subjective is an application of sentiment analysis. Sentiment classification, as another application, categorizes the polarity of opinion mostly as positive or negative. This research focuses on the sentiment analysis of social issues. We have conducted a research that statistically shows that the affective factors on opinions in product domains are different from those in social domains. Based on the findings of this research, a framework is proposed for sentiment analysis of social issues. This framework considers the role of verb in sentiment and defines a quadruple structure for opinion that consists of opinion author, opinion target, opinion expression, and opinion time. One of benefits of the proposed framework is that it extracts expressed opinions that can be used for various applications such as subjectivity classification, sentiment polarity classification, sentiment summarization, sentiment visualization, and sentiment comparison. We have evaluated the performance of our proposed framework for sentiment analysis of public comments regarding abortion as a social issue. We have implemented two applications of sentiment analysis: subjectivity classification and polarity classification.
Achieving communication-efficient privacy-preserving range query in fog-based IoT
(University of New Brunswick, 2021) Mahdikhani, Hassan; Lu, Rongxing; Ghorbani, Ali
Fog-based IoT (Internet of Things) is a fast-growing technology in which many firms and industries are currently investing to develop their own real-time and low latency decentralized data processing and analysis applications. It narrows down the gap between cloud and IoT end-devices as cloud computing is not a consistently perfect solution for many IoT applications. Compared with the traditional IoT solutions, fog-enabled IoT can offer a high level of compliance, better efficiency, and stronger security by providing local data pre-processing, filtering, and forwarding mechanisms. These benefits make the fog-enhanced IoT an appropriate paradigm for many IoT services in different applications varying from health monitoring systems to smart grids and even food manufacturing. However, fog-enhanced IoT arises many security and privacy concerns since fog nodes are deployed at the network edge and may not be fully trustable. Furthermore, fog is considered as a non-trivial extension of the cloud, and thus some security and privacy challenges will continue to persist. These challenges might affect the adaptation of fog computing into the IoT. At the same time, fog improves the IoT end-devices' security and privacy by offering an ideal platform to employ homomorphic encryption schemes. Homomorphic encryption schemes allow performing mathematical operations on ciphertexts without violating the IoT devices' privacy. This means that instead of separately delivering each IoT device's data to the control center, the fog nodes can forward the encrypted aggregated results. This alternative approach significantly reduces the communication overhead and greatly strengthens the security robustness. Thus, system developers can design data aggregation algorithms that yield more bandwidth-efficient, secure, and private schemes than traditional cloud deployment. In this thesis, we emphasize on range aggregate queries in fog-enhanced IoT. In particular, we carry on research on communication and computational efficient privacy-preserving range query processing schemes in which the querying user can efficiently execute range queries on IoT end-devices in the fog computing environment. The main contributions of this thesis can be summarized as 1) Taking the computational burden into consideration, we devise an efficient Symmetric Homomorphic Encryption (SHE) scheme. The proposed scheme maintains data privacy and security as well as supports homomorphic calculation in arithmetic circuits including both multiplication and addition operations. 2) To achieve higher communication performance, we develop some range decomposition/composition techniques to transfomr the range queries. These techniques transform a given range query [L; U] into corresponding data structures that realize privacy-preserving communication-efficient range aggregate query protocols. We develop three different decomposition/composition schemes and investigate their computational and communication performance. 3) Analysing the security of these developed schemes to ensure that proposed schemes are privacy-preserving, i.e. querying user's query and IoT end-devices' data can not be identified or profiled by not only fraudulent/dishonest but also honest-but-curious entities. 4) Conducting extensive performance evaluations to demonstrate the effectiveness of the proposed schemes in terms of communication outcomes and computational effort reduction.
Achieving more effective fraud detection
(University of New Brunswick, 2021) Erfani, Masoud; Ghorbani, Ali
Nowadays, most financial transactions are virtual all over the world. The rapid usage of credit cards and transnational online applications raises fraudulent activities using these services. So, fraud detection is one of the challenging real-world problems. One of the main challenges in fraud detection is imbalanced datasets, where there are very few cases of fraud and a massive amount of non-fraud samples. Also, the behavior of fraud changes frequently, making the learning process for the state-of-the-art machine learning binary classifiers complicated. As a result, in this thesis, we propose two effective frameworks for fraud detection to deal with this challenge. Our first framework consists of a novel preprocessing and subsampling step, which is followed by applying deep support vector data description for fraud detection. In our second framework, we introduce two versions of an ensemble of one-class classifiers. We utilize the Bootstrapping technique to create different training datasets for various weak learners to form a more robust model in the Bagging version. In our Stacking version, we divide the training dataset into two folds. We train the weak learners on the first fold. Then, we add their predictions on the remaining part of the training dataset to the second fold. Finally, the meta learner is trained on the second fold to make the final prediction. These two steps form a more robust model to deal with the imbalanced problem. Furthermore, we provide a trend analysis based on the size of the training, test datasets, and performance of the model using Area Under the Receiver Operating Characteristic Curve (ROC-AUC), Average Precision (AP), and F1 measures as metrics based on a real-world dataset. Also, we evaluate our frameworks on a publicly available synthetic datasets to measure their performance in a complex situation. Finally, based on the results, our both approaches outperform SVM and Random Forest as the state-of-the-art binary classifiers in different scenarios. They achieve remarkable performance in terms of AP, ROC-AUC, and F1 measures equal to 90%, 93%, and 85% (Best results), respectively.
An automatic authorship attribution technique for Android applications
(University of New Brunswick, 2017) Robledo, Hugo; Ghorbani, Ali; Stakhanova, Natalia
An intelligent malware classification framework
(University of New Brunswick, 2015) Samani, Elaheh; Ghorbani, Ali
Malicious software or malware has risen to become a primary source of most of the attacks taking place across the Internet over the last decades. This prevalence of new malware, for which signatures are not available, along with the challenge of anti-malware software to keep up with the continuous stream of new malware, has made the adoption of classification/-clustering approaches necessary. Machine-learning methods have been excessively applied to classify or cluster malware into families, based on different features derived from static or dynamic review of the malware. While these approaches demonstrate promise, they are themselves subject to a growing array of countermeasures. In this work, we propose a framework to enhance the traditional machine learning-based classification by utilizing high-level domain knowledge. We outline major behaviours of Windows malware from an analyst's point of view and provide possible methods (rules) to extract them from the output of static and dynamic analysis tools. We also take advantage of memory forensics to extract other stealthy aspects of an executable, which otherwise remain undetected. Our comparative experimental results with the state-of-the-art malware classification approaches, confirm the effectiveness of our framework by an average classification accuracy of 81%, while leaving only 0.5% of samples unlabeled.
An SMS-based mobile botnet detection framework using intelligent agents
(University of New Brunswick, 2016) Alzahrani, Abdullah; Ghorbani, Ali
Along with increasing security measures in Android platforms, the amount of Android malware that use remote exploits has grown significantly. Using mobile botnets, attackers concentrate on reliable attack vectors such as SMS messages. Short Message Service (SMS) has been increasingly targeted by a number of malicious applications ("apps") that have the ability to abuse SMS features in order to send spam, to transfer command and control (C&C) instructions, to distribute malicious applications via URLs embedded in text messages, to send text messages to premium-rate numbers, and to exploit smartphones. Efficient detection and defence techniques that use filtering and blocking methods for SMS botnets is therefore an urgent necessity. Unfortunately, most botnet detection solutions proposed so far are reactive; that is, they require a large amount of data in order to effectively generate signatures and filtering rules to differentiate between normal and malicious SMS messages. By using proactive approaches such as a multi-agent system, agents can monitor certain environments and report abnormal behaviour in order to protect user data. In this thesis, we propose an SMS-based botnet detection framework using intelligent agents that are used to detect malicious SMS messages and monitor smartphone resources which are typically targeted by SMS botnet attacks. The proposed detection framework is based on a multi-layer model which consists of three modules and intelligent agents. The first is an SMS signature-based detection module which can be used to combat SMS botnets, in which we first apply pattern-matching detection approaches for incoming and outgoing SMS text messages, and then use rule-based techniques to label unknown SMS messages as suspicious or normal. The second module, an anomaly-based detection module, employs unsupervised learning techniques, using clustering algorithms to group SMS messages into four class labels and to classify reported text messages to one of those four classes. The module also uses a robust and efficient behavioural profiling analysis to detect whether there are any correlations between classification results and alerts from profiling analysis. Rule-based correlations are used to label SMS messages as either normal or malicious. The third module is a defence module that can be used as a more proactive approach which directly generates signatures and rules in order to protect Android smartphones from abuse by SMS botnets. This module is used to generate signatures of malicious SMS messages, to update phone number blacklists, to analyze malicious applications and to send feedback to Android smartphones so that the user can take action. Finally, a multi-agent system that can be used to observe Android mobile devices and to interact with service provider agents in order to detect malicious applications and SMS botnet activities on Android mobile devices. We have developed an intelligent and proactive framework that scans incoming and outgoing text messages, monitors Android resources and observes user usage that includes user connectivity time. The framework creates a user profile that is used to perform behavioural profiling analysis in order to identity malicious SMS and cut the C&C Channel. The proposed framework has been implemented using JADE agents. We demonstrate the capability of the multi-agent system, signature-based detection, anomaly-based detection module, and defence module in accurately detecting SMS botnets, we conduct different experiments in three phases. In the first phase, we focus on evaluating the efficiency of the SMS signature detection module in Android devices. This module was evaluated using over 12,000 test messages. It was able to detect all 747 malicious SMS messages in the dataset (100% detection rate with no false negatives). It also flagged 351 SMS messages as suspicious. A comprehensive performance analysis of the anomaly-based detection module is conducted in the second phase. The detection performance of the anomaly-based detection module has an average accuracy of 95% and an average of false negative rate is 3.95% on applied datasets. After having studied the performance of each module individually, in the last phase, we analyzed the overall performance of the proposed framework and provided a thorough analysis of JADE agents monitoring mechanism after demonstrating the capability of each module individually. We used approximately 60,000 test messages to evaluate the proposed framework. The signature detection agents reported 165 malicious SMS messages and 3,081 suspicious SMS messages. The anomaly-based detection module labelled 941 SMS messages as malicious.
Bursty event discovery from online news outlets
(University of New Brunswick, 2015) Kochak, Seyed; Ghorbani, Ali
On this thesis, we have developed a set of methods along with a framework for discovery of bursty events and their relationship from streams of online news articles. Bursty event discovery can be done using the discovered bursty terms which are significantly smaller in size compared to the original feature-set. Moreover, the discovered bursty events are compared in order to discover any potential relational link between any of two. It is the assumption of this work that bursty events and their relationship in time can provide useful information to firms and individuals who their decision making process is significantly affected by news events. The system performed at 64% level of accuracy on a real world dataset. The results show a great promise as do the implicit measures that our proposed framework and methods can be utilized towards real world applications.
Bursty topic detection using acceleration and user influence
(University of New Brunswick, 2017) Ali, Rizwan; Ghorbani, Ali
In this thesis, we present a system which detects bursty topics from real-time social media data. Bursty topics are topics which get a sudden surge in their mentions online in a very short period. They are detected using acceleration of keywords from the real-time data. The acceleration of a bursty topic is measured using the increase in appearance of keywords of a topic over a small period. Along with acceleration, user influence is used to score keywords in our system to improve the detection by increasing the keyword precision of the detected bursty topics. Bursty topics are formed using the top scoring keywords, based on acceleration and influence, and grouping them based on the similarities in their term document vectors. We use soft frequent pattern mining approach for generating topics. The bursty topics are also linked to bursty topics detected on previous time windows by comparing similarities in their keywords. Bursty topics detected using acceleration are evaluated with and without user influence score, with a baseline topic model. We use Latent Dirichlet Allocation topic model as the baseline. It is found that user influence helped the topic detector improve its precision by 11% on average. The results show that user influence can add great value to bursty topic detection methods.
Conversation-based P2P botnet detection with decision fusion
(University of New Brunswick, 2013) Zhang, Shaojun; Ghorbani, Ali
Botnets have been identified as one of the most dangerous threats through the Internet. A botnet is a collection of compromised computers called zombies or bots controlled by malicious machines called botmasters through the command and control (C&C) channel. Botnets can be used for plenty of malicious behaviours, including DDOS, Spam, stealing sensitive information to name a few, all of which could be very serious threats to parts of the Internet. In this thesis, we propose a peer-to-peer (P2P) botnet detection approach based on 30-second conversation. To the best of our knowledge, this is the first time conversation-based features are used to detect P2P botnets. The features extracted from conversations can differentiate P2P botnet conversations from normal conversations by applying machine learning techniques. Also, feature selection processes are carried out in order to reduce the dimension of the feature vectors. Decision tree (DT) and support vector machine (SVM) are applied to classify the normal conversations and the P2P botnet conversations. Finally, the results from different classifiers are combined based on the probability models in order to get a better result.
Distributive continuous profiling for IoT devices
(University of New Brunswick, 2021) Safi, Miraqa; Ghorbani, Ali; Lashkari, Arash Habibi
The proliferation of heterogeneous IoT devices connected to the internet creates security and operational challenges for the network administrators and industries to detect, identify and monitor millions of interconnected IoT devices. Network administrators and industries need to understand what sort of IoT devices are joined or trying to connect to their network, which devices are functional, which devices need security updates, and which devices are vulnerable to specific attacks. Furthermore, limited storage and computing power, small cryptographic keys for a cryptographic operation, and common vulnerabilities in specific devices create a point of intrusion to the hackers. The industries need to identify and monitor the connected devices' specific behavior and isolate the suspected and vulnerable devices within the network for further monitoring. In this thesis, we propose a distributive continuous profiling model for identifying the local node of IoT devices, mapping them to their common vulnerability, and continuously updating the profile. We also provide a comprehensive review of various IoT device profiling methods and provide a clear taxonomy for IoT profiling techniques based on different security perspectives. We investigated and analyzed numerous current IoT device vulnerabilities, multiple features and provided detailed information useful for implementing the risk assessment/mitigation of the organizational network. We used a hybrid set of features and extracted 58 features from the network traffic generated by IoT devices. We introduced 23 new features for the profiling approach to identify IoT devices with improved accuracy and shorter training time than existing methods. We experimented with 18 machine learning classifiers on three publicly available datasets, including 81 IoT and six non-IoT devices. In the proposed method, random forest and the decision tree classifier outperform the other classifiers; both have an average accuracy, precision, recall, and f1-score of above 90% with a short training time. Decision Tree requires less time to train the model, which helps continuously update the devices' profile.
DNA-Droid: a real-time Android ransomeware detection framework
(University of New Brunswick, 2017) Ghraib, Amirhossein; Ghorbani, Ali
Domain generation algorithm (DGA) detection
(University of New Brunswick, 2020) Upadhyay, Shubhangi; Ghorbani, Ali
Domain name plays a crucial role today, as it is designed for humans to refer the access point they need and there are certain characteristics that every domain name has which justifies their existence. A technique was developed to algorithmically generate domain names with the idea to solve the problem of designing domain names manually. DGAs are immune to static prevention methods like blacklisting and sinkholing. Attackers deploy highly sophisticated tactics to compromise end-user systems to gain control as a target for malware to spread. There have been multiple attempts made using lexical feature analysis, domain query responses by blacklist or sinkholing, and some of these techniques have been really efficient as well. In this research the idea to design a framework to detect DGAs even in real network traffic, using features studied from legitimate domain names in static and real traffic, by considering feature extraction as the key of the framework we propose. The detection process consists of detection, prediction and classification attaining a maximum accuracy of 99% even without using neural networks or deep learning techniques.
Early Stage Botnet Detection and Containment via Mathematical Modeling and Prediction of Botnet Propagation Dynamics
(2010) Rrushi, Julian; Mokhtari, Ehsan; Ghorbani, Ali
The research that we discuss in this technical report shows that mathematical models of botnet propagation dynamics are a viable means of detecting early stage botnet infections in an enterprise network, and thus an effective tool for containing those botnet infections in a timely fashion. The main idea that underlies this research is to localize weakly connected subgraphs within a graph that models network communications between hosts, consider those subgraphs as representatives of suspected botnets, and thus employ applied statistics to infer the underlying propagation dynamics. The inferred dynamics are materialized into a model graph, which we use within a subgraph isomorphism search process to determine whether or not there is a match between the inferred propagation dynamics and the actual propagation dynamics observed from the weakly connected subgraphs. We conduct modeling based on an intersection of statistics and graph theory such as a match between the two leads to a timely identification of infected hosts. Our mathematical modeling relies on measures of network vulnerability rates, which in this research we estimate via a statistical approach that draws on epidemiological models in biology. That estimation approach is based on random sampling and follows a novel application of statistical learning and inference in a botnet-versus-network setting. We have implemented this overall research in the Matlab and Perl programming languages, and thus have validated its effectiveness in practice in the Emulab network testbed. We have also validated the vulnerability rate estimation approach extensively with respect to realistically simulated botnet propagation dynamics in a GTNetS network simulation platform. In the technical report we describe our overall approach in detail, and thus discuss experiments along with experimental data that are indicative of the effectiveness of our overall approach to detect early stage botnet infections in an enterprise network.

Browsing by Author "Ghorbani, Ali"

Results Per Page

Sort Options