Graduate Research
Permanent URI for this community
Browse
Browsing Graduate Research by Subject "Computer Science"
Now showing 1 - 20 of 317
Results Per Page
Sort Options
Item A behavioral based detection approach for business email compromises(University of New Brunswick, 2019) Maleki, Nasim; Ghorbani, AliThe most recent infectious vector in email attacks is Business Email Compromise (BEC), which is an entry point for attackers to get access to an enterprise network and obtain valuable company data. According to the Symantec Internet Threat Security Report (ISTR), around 7,710 organizations are hit by a Business Email Compromise attack every month. A BEC is a type of phishing attack that criminals impersonate a person of authority in an organization (CEO) through spoofing or take-over accounts. Since spoofing techniques are detectable using SPF, DMARC, and DKIM, we proposed and implemented a behavioral-based framework for the detection of BEC when accounts or machines are compromised. This framework stops malicious emails on the sender-side because the lack of enough email of the sender on the receiver-side cannot result in a representative user-profile. Moreover, a compromised account or machine turns into a devastating weapon targeting many people. Hence it ought to be stopped from the sender-side, and the real owner should be notified of this disaster. Our framework in the experiment on Enron Dataset for all users has reached a total average of 92% and 93% for Accuracy and F1 score, respectively.Item A blockchain-based privacy-preserving medical insurance storage system(University of New Brunswick, 2019) Luong, Son; Lu, RongxingBlockchain technology is an innovative invention that is disrupting many industries including business and healthcare. In this thesis, we propose a blockchain-based privacy-preserving medical insurance storage system. This system takes advantage of the decentralization and immutability properties of blockchain technology, and makes use of a (2,3)-threshold secret sharing scheme to achieve the privacy-preservation property. In an experimental setup of the system, there are a public blockchain, a patient, four hospitals - an owner hospital and three helper hospitals, and an insurance company. For a patient, any hospital can become the owner hospital while the other three become helpers. The owner hospital holds the spending records of the patients and publishes that data to the blockchain. The helper hospitals help the insurance company to query for the patient’s spending data on the blockchain and perform homomorphic computations on the results. However, the helpers cannot learn anything about the patient’s spending data, as long as there is no collusion between helpers. We deploy our system on the Ethereum blockchain and give a final performance evaluation.Item A Bloom filter based authentication scheme for vehicular digital twin(University of New Brunswick, 2024-03) Adeyiga, Olajide; Lu, RongxingThe rapid growth in the automobile industry and the competitive nature of industry players has necessitated a closer connection between vehicles and their owners. This work will explore indepth the use of a Bloom filter based mutual authentication scheme in a vehicular digital twin system. Currently research into digital twin of vehicles within the IoT space shows that vehicles require a constant means of communication with their digital twin while the digital twin also requires such means of communication with the IoVs and other Digital Twin systems. However, these systems exhibit significant security gaps. The system is currently prone to adversarial attacks like the replay, anonymity, linkability attacks among others. The goal of this research will be the implementation of an authentication scheme that provides secure connection between all entities within a vehicular digital twin network. This scheme will use user credentials and vehicle private features to achieve mutual authentication.Item A cloud-based framework for smart grid data, communication and co-simulation(University of New Brunswick, 2021-10) Adeyemo, Gabriel; Kent, Kenneth B.Renewable energy has caused rapid advancements in electric power systems. The advanced grid is a smart grid with information and communication technologies and bi-directional flow of information. Data in a smart grid aligns with the characteristics of big data. Choosing the most efficient technology to manage data in the grid (real and/or simulation) is crucial to the performance of the grid. This project explores a framework that supports large scale power and network co-simulation and manages communication and data in smart grid co-simulation, real world smart grid systems and a combination of both using message-oriented middleware and cloud technologies. We designed and implemented a framework with RabbitMQ, Apache Kafka, OpenDSS, OMNeT++, Apache Spark, Docker and Kubernetes. We evaluate our implementation on accuracy, scale and usability with three applications including a demand-response application based on logistic regression. The results of our evaluation meet the goals defined for the research thesis.Item A comparison of machine learning algorithms for zero-shot cross-lingual phishing detection(University of New Brunswick, 2023-08) Staples, Dakota; Hakak, Saqib; Cook, PaulPhishing is a major problem worldwide. Existing studies have focused mainly on detecting emails in one language (mostly English). However, detecting emails in multiple languages is challenging due to a lack of datasets. Without ample data from which to learn, the models cannot detect a benign email from a spam email accurately, resulting in false positives and negatives. This research aims to compare the performance of numerous machine learning models and transformers using zero-shot learning for multilingual phishing detection. In a zero-shot learning set-up, the model is trained on one language and tested on another. English, French, and Russian emails are used as the training and testing languages. My results show that, on average, XLM-Roberta performs the best out of all the tested models in terms of accuracy scoring 99% testing on English, 99% testing on French, and 95% testing on Russian.Item A deep learning based framework for detecting and visualizing online malicious advertisement(University of New Brunswick, 2018) Zhang, Xichen; Ghorbani, AliItem A detection framework for android financial malware(University of New Brunswick, 2019) Abdul Kadir, Andi Fitriah; Ghorbani, Ali; Stakhanova, NataliaAs attempts to thwart cybercrime have intensified, so have innovations in how cybercriminals provision their infrastructure to sustain their activities. Consequently, what motivates cybercriminals today has changed: from ego and status, to interest and money; the cybersecurity researchers today have turned their attention to financial-related malware, especially on the Android platform. There are some major issues faced by researchers in detecting Android financial malware. Firstly, what constitutes Android financial malware is still ambiguous. There is a disparity in labelling the type of malware where most of the current detection systems emphasize the recognition of generic malware types (e.g., Trojan, Worm) rather than indicating its capabilities (e.g., banking malware, ransomware). Without knowing what constitutes financial malware, the detection systems are not capable of providing an accurate recognition of an advanced and sophisticated financial-related malware. Secondly, most of the current anomaly-based detection systems via machine learning suffer from inaccurate evaluation and comparison due to the lack of adequate datasets, which result in unreliable outputs for real-world deployment. Due to time consuming processes, most of the available datasets are crafted mainly for static analysis, and those created for dynamic analysis are installed on an emulator or sandbox. Sophisticated malware can bypass these approaches; malware authors have employed obfuscation methods and included a wide range of anti-emulator techniques, where the malware programs attempt to hide their malicious activities by detecting the emulator. These deficiencies are some of the major reasons why Android financial malware is able to avoid detection. A comprehensive understanding of the existing Android financial malware attacks supported by a unified terminology and high-quality dataset is required for the deployment of reliable defence mechanisms against these attacks. Therefore, we seek to understand trends and relationships between Android malware families and devise a taxonomy of Android financial malware attacks. In addition, a systematic approach to generate the required datasets is presented to address the need to use physical platforms instead of emulators. In this regard, an automated dynamic analysis system running on smartphones is developed to generate the desired dataset in a testbed environment. In order to correlate the generated dataset and the proposed taxonomy, a hybrid framework for malware detection is presented. We propose a novel combination of both static and dynamic analysis based specifically on features derived from the string literal (statically via reverse engineering) and network flow (dynamically on smartphones). This combination can assist security analysts in recognizing the threats effectively. We employ five common classifiers to construct the best model to identify malware at four levels: detecting malicious Android apps, classifying Android apps with respect to malware category and sub-category, and characterizing Android apps according to malware family. Specifically, a dataset containing over 5,000 samples is used to evaluate the performance of the proposed method. The experimental results show that the proposed method with a Random Forest classifier achieves an accuracy of over 90% with a very low false positive rate of 4% on average.Item A dynamic graph-based malware classifier(University of New Brunswick, 2016) Jazi, Hossein Hadian; Ghorbani, AliThe anti-virus industry receives a sheer amount of new malware samples on a daily basis. The prevalence of new sophisticated instances, for most of which no signature is available, coupled with the significant growth of potentially harmful programs have made the adoption of an effective automated classifier almost inevitable. Due to the vast majority of obfuscation techniques employed by the malware authors, extraction of a high-level representation of malware structure is an efficient way in this regard. High-level graph representations such as Function Call Graphs or Control Flow Graphs are able to represent the main functionality of a given sample in more abstract way. The graph-based approaches have mostly revolved around static analysis of the binary and share the common drawbacks of any static-based approaches. For example, generating a graph from a packed executable does not reflect the real structure of the code at all. In addition to the type of analysis, the scalability of these approaches is also affected by the employed graph comparison algorithm. Full graph comparison is by itself an NP-hard problem. Approximated graph comparison algorithms such as Graph Edit Distance have been commonly studied in the field of graph classification. To address the two major weaknesses involved with the current graph-based approaches, we propose a dynamic and scalable graph-based malware classifier. At the time of this proposal, this is the first attempt to generate and classify dynamic graphs. In spite of providing more accurate graphs, dynamic analysis leads to the generating larger graphs, and aggravating the problem of comparison measurement. To address this problem we modify an existing algorithm called Simulated Annealing to reduce computational complexity. To have a reasonable estimation of the effectiveness, our proposed system is compared against Classy, which is the state-of-the-art graph-based system. Our results show that proposed classifier is able to outperform Classy by an average classification accuracy of 94%, 4% false positive rate, and leaving only 2% of samples unlabeled.Item A fault tolerant data structure for Peer-to-Peer range query processing(University of New Brunswick, 2015) Mirikharaji, Zahra; Nickerson, BradfordWe present a fault tolerant dynamic data structure based on a constant-degree Distributed Hash Table called FissionE that supports orthogonal range search in d-dimensional space. A publication algorithm, which distributes data objects among all nodes in the network is described, along with a search algorithm that processes range queries and reports all objects in range to the query issuer. The worst case orthogonal range search cost in our data structure with n nodes is O(log n + m) messages plus reporting cost, where m is the minimum number of nodes intersecting the query. We have proved that in our data structure the cost of reporting data in range to the query issuer is ∑mi=1 ⌈Ki/B⌉ O(log n) ∈ O((K/B + m) log n) messages, where K is the number points in range, Ki is the number of points in range stored in node i, and B is the number of points fitting in one message. Storing d copies of each data objects on d different nodes provides redundancy for our scheme. This redundancy permits completely answering a query in the case of simultaneous failure of d — 1 nodes. Results of our experimental simulation with up to 12,288 nodes show the practical application of our data structure.Item A framework for developing adaptive service compositions(University of New Brunswick, 2018) Bashari, Mahdi; Du, Weichang; Bagheri, EbrahimThis thesis proposes a framework for automatic generation of self-healing service composition which can recover from functional and non-functional failures. To this end, it firstly proposes an automated method for generation of service composition which enables a user to build a service composition by selecting a set of desired features and secondly it proposes a method for adapting the generated service composition to recover autonomously from service failures or non-functional constraint violations. The proposed service composition method uses software product line engineering concepts to build a repository of features and link them to their corresponding services. Using this repository, it uses AI planning to build a work flow of service interactions based on the requirements. It then uses concepts from partial-order-planning to optimize the generated work flow. Eventually, the generated work flow is converted to structured and executable BPEL code. The proposed adaptation method extends the composition software product line to become a dynamic software product line. The proposed dynamic software product line is capable of re-selecting features of a running service composition to continue service with limited features to recover from a service failure or a violation of critical non-functional requirements. A method has been proposed which uses linear regression to determine the effect of features on the non-functional properties of service composition. Knowing how each feature affects non-functional requirements, a method has been proposed which reduces the problem of finding an alternate set of features which recovers service composition from service failure or non-functional requirement violation to a pseudo-boolean optimization problem, which can then be solved. An online tool-suite realizing the proposed framework has been implemented and the usability, effectiveness, and reliability of the proposed framework have been evaluated with extensive experiments.Item A framework for migration of conventional client-server software systems to cloud(University of New Brunswick, 2014) Zheng, Jianbo; Du, WeichangAs an emerging model for the delivery of software services, Software as a Service (SaaS) becomes a trend in software industry due to its low investment, flexibility and accessibility. However, migration of conventional client-server software systems and applications to SaaS may involve complicated processes. This thesis proposes a framework named A2SF for helping software developers to migrate conventional client-server applications to high quality SaaS based applications in cloud environments, with multi-tenancy support, without re-developing or modifying the original applications. The migration framework consists of four components: service proxy, data proxy, tenant management, and cloud resources management. The four framework components, together with an original client-server application, can be seamlessly deployed on the cloud as an SaaS software. A prototype of A2SF has been implemented on the Amazon AWS cloud platform. Based on A2SF, the thesis also describes a general cloud migration process for client-server applications and presents a case study of migrating a real-world client-server application to Amazon AWS cloud.Item A framework to process and exchange logical rules in multiple rule languages(University of New Brunswick, 2018) Akbari, Ismail; Biletskiy, Yevgen; Du, WeichangWeb rule languages have been developed for the Web-based interchange of rules, in particular business rules, business policies, and any business or application logic that can be presented with rules. The primary goal of this dissertation is to create methods for rule interchange between selected rule markup languages, as well as to develop a rule engine for a subset of W3C’s RIF language. Logical rule interchange is the act of transforming rules presented in one rule language to another. The rule interchange is done from the Notation3, POSL, RuleML, SWRL rule languages to RIF-BLD. In this dissertation, to enable rule interchange in the Semantic Web, a framework has been proposed. The framework contains different rule grammars, parsers, visualizers and translators as well as a rule engine. A grammar, parser and rule visualizer are developed for each of the N3, POSL and RIF-BLD languages. Also, rule translators from N3, POSL, SWRL, and RuleML to RIF-BLD are developed. As a central component, a rule engine for the RIF-BLD language has been developed. The rule engine comprises both forward and backward reasoning. All the grammars, parsers, visualizers, translators and the rule engine are part of the framework. The translators and the rule engine have been separately evaluated with various use cases and with a case study on the independently provided Port Clearance Rules.Item A genetic-algorithm-based solution for HTTP-based malware signature generation(University of New Brunswick, 2014) Pourafshar, Amir; Ghorbani, AliThe rise in prevalence of malwares has become the most serious threat to Internet security. In order to minimize the devastating impact of this threat many malware detection strategies and systems have been developed, in recent years. This thesis presents a novel malware signature generation and evolution system to detect never-before-seen malwares. We focus on automatic generation of evolved signatures for HTTP-based malware traces based on features and the structure of currently known malwares. The idea is that we can evolve signatures of known malwares to predict the structure of future malware traces since they usually inherit some of their characteristics and structure from their predecessors. We implemented a proof-of-concept version of our proposed evolutionary signature generation system. Datasets of malicious and legitimate network traffic have been used to evaluate the proposed system. Results from performed experiments show the system's ability in detecting an acceptable portion of new, unknown malware samples while maintaining a low false alarm rate. Using the base and evolved signatures together increased the average detection rate of the unknown malicious traces from 38:4% to 50:8%. This improvement happens while the average false positive rate of the evolved signature sets is 2:7 * 10‾³.Item A meta-learning approach for evaluating the effect of software development policies(University of New Brunswick, 2017) Stewart, James Ashley; Tassé, JoséeDelivering high-quality software on time and on budget is a challenging endeavor but it can be made more likely by adhering to an approach where guidance is provided through the use of software development policies. Software development policies represent standards and best practices that a company has chosen to follow throughout their software development effort. For our purposes, a software policy is a statement of conduct intended to guide and constrain development activities. Policies can be written to capture company guidelines, industry best practices, empirical research, and even past experience. A simple example of a policy might read “a preliminary design must be completed before implementation begins.” Policies help to ensure the existence of certain environmental conditions that are conducive to a successful outcome. Depending on the situation, however, the policies in use may not have the expected effect. Currently, there does not exist a formal way to evaluate a company’s policy set without resorting to extensive experimentation or case study on each policy. We propose a method that monitors weekly success indicators on project aspects such as quality, time, budget, and morale. The policies in use are then evaluated against these indicators resulting in a summary of those policies thought to impact process performance. Due to the many complexities of this problem (e.g. policy interactions, delayed effects of changes, etc.), our method consists of a combination of several different analysis techniques that are combined to yield a more complete solution. Our set of analysis methods currently includes: a form of linear regression adapted for greater sensitivity; a check that extreme values coincide; a trend analysis that detects whether data generally deviates in the same (or opposite) direction; and a special check adapted specifically for discrete measures. The results from each method are then combined using a meta-learner that compares the similarity of the ranked results produced by each individual technique, and provides a single indicator of how strongly they agree. To ensure our method works and is practical, we validated it against industry data from a leading Canadian business-solutions provider. Despite the many challenges inherent with real-world data (e.g., missing, inconsistent, incorrect, biased, sparse, and limited data), our validation work indicates that our method can identify more potential effects than other traditional approaches, especially the more subtle weaker effects, which can serve as a trigger for further investigation. These results shall be of special interest to project managers in their efforts to deliver on successful projects.Item A multi-level matching approach applied to a semantic based dating site(University of New Brunswick, 2013) Sharawat, Chirag; Kent, KennethOnline dating or internet dating has become more common, more convenient and a quicker medium for people to find potential matches for developing different types of relationships such as personal, romantic, or sexual, than searching for partners in conventional ways. A common parameter across studies of human partner selection has been the physical attractiveness of individuals. In many past studies it has been predicted that, in making a realistic social choice, an individual would choose a partner similar in socially desirable traits like education level, income level and other credentials. Users may seek out online dating websites as a means of finding potential partners more quickly. Producing a set of matches for a person that he/she will find suitable is one of the major challenges in the online dating industry. Users enter information about themselves that is then used to identify other users as potential matches. However, many users become frustrated because they deem the list of potential matches unsuitable. This leads to customers cancelling their memberships and giving the websites poor reviews. This report addresses how the use of leading-edge semantic web technologies can provide better matches for dating in a more efficient and convenient way. These techniques consider more than just parameter based criteria (e.g. females between the age of 21 and 26); information provided is used not only to eliminate unsuitable matches but also to infer additional properties of the user that will be used in the matching process.Item A multi-sense context-agnostic definition generation model evaluated on multiple languages(University of New Brunswick, 2020) Kabiri, Arman; Cook, PaulDefinition modeling is a recently-introduced task in natural language processing (NLP) which aims to predict and generate dictionary-style definitions for any given word. Most prior work on definition modelling has not accounted for polysemy — i.e. a linguistic phenomenon in which a word can imply multiple meanings when used in various contexts — or has done so by considering definition modelling for a target word in a given context. In contrast, in this study, we propose a context-agnostic approach to definition modelling, based on multi-sense word embeddings, that is capable of generating multiple definitions for a target word. In further contrast to most prior work, which has primarily focused on English, we evaluate our proposed approach on fifteen different datasets covering nine languages from several language families. To evaluate our approach we consider several variations of BLEU — i.e., a widely-used evaluation metric initially introduced for machine translation that is adapted to definition modeling. Our results demonstrate that our proposed multisense model outperforms a single-sense model on all fifteen datasets.Item A novel evasive PDF malware detection model based on stacking learning(University of New Brunswick, 2021-12) Issakhani, Maryam; Lashkari, Arash HabibiOver the last few years, Portable Document Format (PDF) has become the most popular content presenting format among users due to its extraordinarily flexible and easy-to-work features. However, advanced PDF features such as Javascript injection or file embedding make them an attractive target to exploit by attackers. Due to the complex PDF structure and sophistication of attacks, traditional detection approaches such as Anti-Viruses are ineffective as they rely on signature-based techniques. Various research works take a different direction and attempt to utilize AI technologies such as Machine Learning (ML) and Deep Learning (DL) to detect malicious PDF files. Despite the results from the research communities, the evasive malicious PDF files are still a security threat. This research attempts to address this gap by proposing a novel framework that stacks ML models for detecting an evasive malicious PDF file. In addition to that, we evaluated our solution by using two different datasets, Contagio and a newly generated evasive PDF dataset. In the first evaluation, we achieved accuracy and F1-score of 99.89% and 99.86%, which is better than the performance of existing models. Moreover, we re-evaluated our framework using our new evasive PDF dataset as an improved version of Contagio to verify our solution further. As a result, we achieved 98.69% and 98.77% as accuracy and F1-scores, demonstrating the effectiveness of our approach to improve the results. Experimental results along with the new dataset proved that our model could be applied in practice.Item A novel transformer-based multi-step approach for predicting common vulnerability severity score(University of New Brunswick, 2024-06) Bahmanisangesari, Saeid; Ghorbani, Ali A.; Isah, HarunaThe timely prediction of Common Vulnerability Severity Scores (CVSS) following the release of Common Vulnerabilities and Exposures (CVE) announcements is crucial for enhancing cybersecurity responsiveness. A delay in acquiring these scores may make it more difficult to prioritize risks effectively, resulting in the misallocation of resources and a delay in mitigating actions. Long exposure to untreated vulnerabilities also raises the possibility of exploitative attacks, which could lead to serious breaches of security that compromise data integrity and harm users and organizations. This thesis develops a multi-step predictive model that leverages DistilBERT, a distilled version of the BERT architecture, and Artificial Neural Networks (ANNs) to predict CVSS scores prior to their official release. Utilizing a dataset from the National Vulnerability Database (NVD), the research examines the effectiveness of incorporating contextual information from CVE source identifiers and the benefits of incremental learning in improving model accuracy. The models achieved better results compared to the top-performing models among other works with an average accuracy of 91.96% in predicting CVSS category scores and an average F1 score of 91.87%. The results demonstrate the model’s capability to predict CVSS scores across multiple categories effectively, thereby potentially reducing the response time to cybersecurity threats.Item A parallel integrated index for spatio-temporal textual search using Tries(University of New Brunswick, 2019) Arseneau, Yoann S. M.; Nickerson, Bradford; Ray, SuprioThe proliferation of location-enabled devices and the increasing use of social media platforms is producing a deluge of multi-dimensional data. Novel index structures are needed to efficiently process massive amounts of geotagged data, and to promptly answer queries with textual, spatial, and temporal components. Existing approaches to spatio-textual data processing either use separate spatial and textual indices, or a combined index that integrates an inverted index with a tree data structure, such as an R-tree or Quadtree. These approaches, however, do not integrate temporal, spatial, and textual data together. We propose a novel integrated index called Spatio-temporal Textual Interleaved Trie (STILT), which unifies spatial, textual, and temporal components within a single structure. STILT is a multi-dimensional binary-trie-based index that interleaves text, location, and time data in a space-efficient manner. It supports dynamic and parallel indexing as well as concurrent searching. With extensive evaluation we demonstrate that STILT is significantly faster than the state-of-the-art approaches in terms of index construction time and query latency.Item A Phishing e-mail detection approach using machine learning techniques(University of New Brunswick, 2017) Mbah, Kenneth Fon; Ghorbani, AliAccording to APWG reports of 2014 and 2015, the number of unique Phishing e-mail reports received from consumers has increased tremendously from 68270 e-mails in October 2014 to 106421 e-mails in September 2015. This significant increase is a proof of the existence of Phishing attacks and the high rate of damages they have caused to Internet users in the past. Because no attention is made in the literature to specifically detect Phishing e-mails related to advertisement and pornographic, attackers are becoming extremely intelligent to use these means of attraction to track users and adjusting their attacks base on users behaviours and hot topics extracted from community news and journals. We focus on detecting deceptive e-mail which is a form of Phishing attacks by proposing a novel framework to accurately identify not only e-mail Phishing attacks but also advertisements or pornographic e-mails consider as attracting ways to launch Phishing. Our approach known as Phishing Alerting System (PHAS) has the ability to detect and alert all type of deceptive e-mails so as to help users in decision making. We are using a well known e-mail dataset and base on our extracted features we are able to obtain about 93.11% accuracy while using machine learning techniques such as J48 Decision Tree and KNN. Furthermore, we equally evaluate our system built based on these above features and obtained approximately the same accuracy while using the same dataset as input to our system.