Faculty of Computer Science (Fredericton)

Pages

Distributive continuous profiling for IoT devices
Distributive continuous profiling for IoT devices
by Miraqa Safi, The proliferation of heterogeneous IoT devices connected to the internet creates security and operational challenges for the network administrators and industries to detect, identify and monitor millions of interconnected IoT devices. Network administrators and industries need to understand what sort of IoT devices are joined or trying to connect to their network, which devices are functional, which devices need security updates, and which devices are vulnerable to specific attacks. Furthermore, limited storage and computing power, small cryptographic keys for a cryptographic operation, and common vulnerabilities in specific devices create a point of intrusion to the hackers. The industries need to identify and monitor the connected devices' specific behavior and isolate the suspected and vulnerable devices within the network for further monitoring. In this thesis, we propose a distributive continuous profiling model for identifying the local node of IoT devices, mapping them to their common vulnerability, and continuously updating the profile. We also provide a comprehensive review of various IoT device profiling methods and provide a clear taxonomy for IoT profiling techniques based on different security perspectives. We investigated and analyzed numerous current IoT device vulnerabilities, multiple features and provided detailed information useful for implementing the risk assessment/mitigation of the organizational network. We used a hybrid set of features and extracted 58 features from the network traffic generated by IoT devices. We introduced 23 new features for the profiling approach to identify IoT devices with improved accuracy and shorter training time than existing methods. We experimented with 18 machine learning classifiers on three publicly available datasets, including 81 IoT and six non-IoT devices. In the proposed method, random forest and the decision tree classifier outperform the other classifiers; both have an average accuracy, precision, recall, and f1-score of above 90% with a short training time. Decision Tree requires less time to train the model, which helps continuously update the devices' profile.
Divisible load scheduling on multi-level processor trees
Divisible load scheduling on multi-level processor trees
by Mark Lord, Divisible Load Theory (DLT) is an effective tool for blueprinting data-intensive computational problems. Heuristic algorithms have been proposed in the past to solve for a DLS (Divisible Load Schedule) with result collection on heterogeneous star networks. However scheduling on heterogeneous multi-level trees with result collection is still an open problem. In this thesis, new heuristic algorithms for scheduling divisible loads on heterogeneous multi-level trees (single- and two-installment) including result collection are presented. Experiments are performed on both random networks and cluster networks. Results show that scheduling using multi-level trees produces lower solution times compared to the traditional star network in the majority of cases, however efficiency of resources in multi-level trees tends to be lower, i.e., more processors were used. Cluster results with multi-level trees are found to outperform the star when there are enough clusters available to provide good overlap of communication and computation. Experiments on random networks with varying levels of heterogeneity of resources show that multi-level trees outperform star networks in the majority of cases. Experiments were conducted comparing schedules with and without latency costs. The results from all schedules where latency was considered had signifiantly lower solution times and higher efficiency of resources. Overall, scheduling on single-installment multi-level trees in either clusters or random networks had the lowest solution times, but the star had highest efficiency of resources., Degree name on title page is mislabeled as "Master of Computer Science In the Graduate Academic Unit of in the Graduate Academic Unit of Computer Science". Also, pagination is wrong. The last page of the front matter (second page of List of Figures) is paginated with an Arabic number 1 (one) instead of a Roman number viii (eight). i.e. Page viii is labeled as page 1, page 1 is labeled as page 2, …, page 89 (last page) is labeled as page 90. Electronic Only. (UNB thesis number) Thesis 8661 (OCoLC) 960871143, M.C.S., University of New Brunswick, Faculty of Computer Science, 2011.
Domain generation algorithm (DGA) detection
Domain generation algorithm (DGA) detection
by Shubhangi Upadhyay, Domain name plays a crucial role today, as it is designed for humans to refer the access point they need and there are certain characteristics that every domain name has which justifies their existence. A technique was developed to algorithmically generate domain names with the idea to solve the problem of designing domain names manually. DGAs are immune to static prevention methods like blacklisting and sinkholing. Attackers deploy highly sophisticated tactics to compromise end-user systems to gain control as a target for malware to spread. There have been multiple attempts made using lexical feature analysis, domain query responses by blacklist or sinkholing, and some of these techniques have been really efficient as well. In this research the idea to design a framework to detect DGAs even in real network traffic, using features studied from legitimate domain names in static and real traffic, by considering feature extraction as the key of the framework we propose. The detection process consists of detection, prediction and classification attaining a maximum accuracy of 99% even without using neural networks or deep learning techniques.
Dynamic monitor allocation in the IBM J9 virtual machine
Dynamic monitor allocation in the IBM J9 virtual machine
by Marcel Dombrowski, With the Java language and sandboxed environments becoming more and more popular, research needs to be conducted into improving the performance of these environments while decreasing their memory footprints. This thesis focuses on a dynamic approach for growing monitors for objects in order to reduce the memory footprint and improve the execution time of the IBM Java Virtual Machine. According to the Java Language Specification every object needs to have the ability to be used for synchronization. This new approach grows monitors only when required. The impact of this approach on performance and memory has been evaluated using different benchmarks and future work is also discussed. On average, a performance increase of 0.6% and a memory reduction of about 5% has been achieved with this approach.
ELF-based code storage support for the Eclipse OMR Ahead-of-Time compiler: a WebAssembly use case
ELF-based code storage support for the Eclipse OMR Ahead-of-Time compiler: a WebAssembly use case
by Damian Diago D’monte, Over the years, Ahead-of-Time (AOT) compilation has drawn significant attention in the research community due to its ability to accelerate the startup of a runtime system. AOT compilation involves compiling, persisting and re-using existing compiled code in later runs, thereby avoiding costly re-compilation. Typically, a program is compiled and translated into machine language or native code, which persists in a binary container format. At present, the most commonly used code container options are either cache-based or object-based. Eclipse OMR is a collection of components, which are used to build robust language runtimes. The WebAssembly AOT compiler (wabtaot) is constructed using the Eclipse OMR JitBuilder library. Currently, the wabtaot compiled code is stored in a prototype of the Eclipse OMR shared cache. The goal of this research is to find a better lightweight AOT code storage option for resource-constrained systems. To achieve this goal, the Eclipse OMR ELF generation module is enhanced by implementing an ELF shared object generator that can store AOT data and use it as part of the Eclipse OMR AOT component. The design and implementation of the ELF shared object for Eclipse OMR is discussed and demonstrated using wabtaot. Evaluation is carried out by comparing the ELF shared-object approach with the existing shared cache in the wabtaot environment on metrics like execution speed, memory footprint, file size and sharing. By doing so, the execution time trade-offs, lower memory consumption and compact-size ELF objects are witnessed, thereby indicating a possibility of using the lightweight ELF shared objects in resource-constrained systems.
Efficient and privacy-preserving AdaBoost classification framework for mining healthcare data over outsourced cloud
Efficient and privacy-preserving AdaBoost classification framework for mining healthcare data over outsourced cloud
by Mahtab Davoudi, In recent years, the analysis and mining of electronic health records (EHRs) with the aid of machine learning (ML) algorithms have become a popular approach to improve the quality of patient care and increase the productivity and efficiency of healthcare delivery. A sufficient amount of data is needed to have robust and more accurate decision-making systems with machine learning algorithms. Due to the high volume of EHRs, many frameworks require outsourcing their data to cloud servers. However, cloud servers are not fully trusted. Moreover, releasing sensitive raw data might put individuals at risk. For example, in Canada, the University of Ontario Institute of Technology (UOIT) in collaboration with IBM, has implemented an online real-time analytic platform, Artemis¹. The Artemis framework is a storage of patients' raw physiological and clinical information and is also used for online real-time analysis and data mining. While utilizing patients' sensitive healthcare data contributes to more accurate diagnoses, it raises security and privacy breaches. In 2019, 25 million patients were the victims of the American Medical Collection Agency (AMCA) data breach². As a result, preserving the privacy of sensitive health records is a pressing issue. A practical solution to ensure the security and privacy of the extreme volume of healthcare data is outsourcing encrypted data to the cloud servers. However, encryption increases the computational cost significantly. As noted earlier, the rapid growth of Machine Learning (ML) and big data have become ubiquitous. However, adversaries may abuse the healthcare data outsourced to the cloud servers without encryption. Thus a Privacy-Preserving (PP) model is required. Researchers have proposed various PP ML models with the aim of different privacy techniques. Nonetheless, time efficiency in PP ML frameworks matters. In comparison to existing ML models, AdaBoost is a fast, simple, and versatile yet highly accurate classifier. Privacy-Preserving techniques can restore the balance between data usage and data privacy. An inefficient privacy technique, by contrast, requires intensive computational power. To address these challenges, we conduct studies and experiments to propose an efficient and privacy-preserving classification framework for mining outsourced encrypted healthcare data. This thesis covers the AdaBoost learning process, classification, Homomorphic Encryption (HE), and Paillier cryptosystem algorithm. The experimental results prove the accuracy and demonstrates the efficiency of our framework. ¹ http://hir.uoit.ca/cms/?q=node/24 ² https://healthitsecurity.com/news/the-10-biggest-healthcare-data-breaches-of-2019-so-far
Efficient classification of complex ontologies
Efficient classification of complex ontologies
by Weihong Song, Description logics (DLs) are knowledge representation languages that provide the theoretical underpinning for modern ontology languages such as OWL and serve as the basis for the development of ontology reasoners. The task of ontology classification is to compute the subsumption relationships between all pairs of atomic concepts in an ontology, which is the foundation for other ontology reasoning problems. There are two types of mainstream reasoners to perform ontology classification: 1) tableau-based reasoners usually support very expressive DLs, but they may not be efficient for large and highly cyclic ontologies. 2) consequence-based reasones are typically significantly faster than tableau-based reasoners, but they support less expressive DLs. It is difficult to extend the consequence-based reasoners directly to support more expressive DLs. In the present thesis, we propose a weakening and strengthening based approach for ontology classification, which aims to extend the capability of an efficient reasoner for a less expressive base language Lb (Lb-reasoner) to support a more expressive language. The approach approximates the target ontology by a weakened version and a strengthened version in Lb. Their subsumptions are a subset and a superset of the subsumptions of the target ontology. There are two cases: (1) the subsumptions of the strengthened ontology are the same as that of the target ontology; (2) there may be more subsumptions in the strengthened ontology, which is therefore unsound. In case (1) which we call soundness-preserved strengthening, we classify only the strengthened ontology with the Lb-reasoner to get the final classification results. In case (2) which we call soundness-relaxed strengthening, a hybrid approach is employed – we first classify both the weakened and strengthened ontologies with the Lb-reasoner, and then use a full-fledged (hyper)tableau-based assistant reasoner to check whether the subsumptions implied by the strengthened ontology are also implied by the target ontology. We first study the general principles to apply weakening and strengthening to extend an Lb-reasoner for a DL language that has one more constructor than Lb, i.e., single extension. Then we study the combination of several single extensions for multiple extended constructors for the reasoner, i.e., multiple extension. Based on the general principles, we investigate two single extensions from the ALCH description language to ALCH(D)¯ and ALCHI with soundness-preserved strengthening, as well as a single extension from ALCH to ALCHO with soundness-relaxed strengthening. Then, we show how to combine them into multiple extensions from ALCH to ALCHI(D)¯, ALCHOI, ALCHO(D)¯, and ALCHOI(D)¯. The soundness and completeness of all the single and multiple extensions are proved. We also develop a prototype ALCHOI(D)¯-reasoner, WSClassifier, based on the proposed approach. We experiment with and evaluate WSClassifier on large and highly cyclic real-world ontologies such as FMA and Galen ontologies, the required languages of which are beyond the capability of the current efficient consequence-based reasoners. The experiments show that on most of these ontologies, WSClassifier outperforms or significantly outperforms conventional (hyper)tableau-based reasoners.
Efficient sub-genome neuroevolution via probabilistic selection of genes
Efficient sub-genome neuroevolution via probabilistic selection of genes
by Aaron Lawrence Broad, Chimera, a novel sub-genome neuroevolution method, solves double pole balancing without velocity (DPNV), a modern version of the classic physical control machine-learning benchmark, in significantly fewer network evaluations than prior sub-genome neuroevolution methods. Neural networks are bio-mimicking computers, distributed, fault-tolerant, and applied to cashing cheques, natural language processing, and many other deep learning applications [11,16]. Deep learning repeatedly modifies a single neural network to produce known desired outputs to training example inputs. Alternatively, neuroevolution (NE) differentially reproduces candidate networks, based on their relative ability to produce training example output, or some other fitness measure [14]. Cooperative NE methods, herein referred to as sub-genome methods, select sub-genomes, or partial networks, rather than whole genomes, or complete networks. Selecting sub-genomes is theorised to select for many independent specialisations, whereas genome selection converges on a single generalisation [5–7, 13] . Sub-genome neuroevolution has historically been made more efficient by selecting smaller more numerous genes (or specialisations). Symbiotic Adaptive Neuroevolution (SANE) randomly composes genomes from a population of chromosomes, and assigns each a fitness value. SANE then selects chromosomes by the average fitness of every genome they were a part of [13]. Enforced Sub-populations (ESP) improves upon SANE by maintaining separate sub-populations of chromosomes for each of the x chromosomes required to form a genome [7]. Cooperative Synapse Neuroevolution (CoSyNE) further improves upon ESP by maintaining sub-populations of genes, one for each gene in a complete genome, rather than sub-populations of multi-gene chromosomes. CoSyNE first selects genomes, but then shuffles the gene subpopulations of selected genomes [5, 6]. This thesis proposes a novel sub-genome neuroevolution method, Chimera, and compares this new method with prior methods for solving DPNV. Chimera simply selects genes from gene sub-populations, with a probability proportional to the fitness of each gene’s genome. Chimera combines CoSyNE’s gene matrix population data structure [5] with ESP’s intra-sub-population reproduction [7] and a probabilistic [14] variant of the fitness-proportional selection originally abandoned by SANE [13]. Chimera uses significantly fewer network evaluations to solve DPNV than any of the prior sub-genome methods examined: SANE, ESP, or CoSyNE. While Chimera uses fewer evaluations, it can use more evaluation steps in tasks where fitness is proportional to total evaluation steps., Electronic Only.
Efficient text search with spatial constraints
Efficient text search with spatial constraints
by Dan Han, This thesis presents a search engine called TexSpaSearch that can search text documents having associated positions in space. We defined three query types Q 1 ( t) , Q2( t, r) and Q3(p, r) that can search documents with queries containing text t, position p and radius r components. We indexed a sample herbarium database of 40,791 records using a novel R*-tree and suffix tree data structure to achieve efficient text search with spatial data constraints. Significant preprocessing was performed to transform the herbarium database to a form usable by TexSpaSearch. We built unique data structures used to index text with spatial attributes that simultaneously support Ql, Q2 and Q3 queries. This thesis presents a novel approach for simplifying polygon boundaries for packing into R*-tree leaf nodes. Search results are ranked by a novel modified Lucene algorithm that supports free form text indexing in a general way. A novel ranking of Q2 search results combines the text and spatial scores. The architecture of a prototype Java based web application that invokes TexSpaSearch is described. A theoretical analysis shows that TexSpaSearch requires O(A 2lbl) average time for Ql search, where A is the number of single words in the query string t, and llbl is the average length of a subphrase in t. Q2 and Q3 require O ( A 2 Tbf + Z logM Vn + y) and O(logM Vn + y), respectively, where Z is the number of point records in the list P of text search results, Vn is the number of data objects indexed in the R*=tree for n records, M is the maximum number of entries of an internal node in the R*-tree, and y is the number of leaf nodes found in range in a Q3 query. Testing was performed with 20 test Ql queries to compare TexSpaSearch to a Google Search Appliance (GSA) for text search. Results indicate that the GSA is about 45.5 times faster than TexSpaSearch. The same 20 test queries were combined with a Q2 query radius r = 5, 50 and 500m. Results indicate Q2 queries are 22.8 times slower than Ql queries. For testing Q3 queries, 15 points were chosen in 15 different N .B. counties. The average Tc, T 8 and Te values of 191.5ms, 3603.2ms and 4183.9ms are given in the Q3 test, respectively, and the average value of Npt + Npl is 1313.4.
Enhancing the MMD algorithm in multi-core environments
Enhancing the MMD algorithm in multi-core environments
by Michael Schlösser, The work done in this thesis enhances the MMD algorithm in multi-core environments. The MMD algorithm, a transformation based algorithm for reversible logic synthesis, is based on the works introduced by Maslov, Miller and Dueck and their original, sequential implementation. It synthesises a formal function specification, provided by a truth table, into a reversible network and is able to perform several optimization steps after the synthesis. This work concentrates on one of these optimization steps, the template matching. This approach is used to reduce the size of the reversible circuit by replacing a number of gates that match a template which implements the same function and uses less gates. Smaller circuits have several benefits since they need less area and are not as costly. The template matching approach introduced in the original works is computationally expensive since it tries to match a library of templates against the given circuit. For each template at each position in the circuit, a number of different combinations have to be calculated during runtime resulting in high execution times, especially for large circuits. In order to make the template matching approach more efficient and usable, it has been reimplemented in order to take advantage of modern multi-core architectures such as the Cell Broadband Engine or a Graphics Processing Unit. For this work, two algorithmically different approaches that try to consider each multi-core architecture’s strengths, have been analyzed and improved. For the analysis these approaches have been cross-implemented on the two target hardware architectures and compared to the original parallel versions. Important metrics for this analysis are the execution time of the algorithm and the result of the minimization with the template matching approach. It could be shown that the algorithmically different approaches produce the same minimization results, independent of the used hardware architecture. However, both cross-implementations also show a significantly higher execution time which makes them practically irrelevant. The results of the first analysis and comparison lead to the decision to enhance only the original parallel approaches. Using the same metrics for successful enhancements as mentioned above, it could be shown that improving the algorithmic concepts and exploiting the capabilities of the hardware lead to better results for the execution time and the minimization results compared to their original implementations., Electronic Only. (UNB thesis number) Thesis 8689. (OCoLC) 960950379, M.C.S., University of New Brunswick, Faculty of Computer Science, 2011.
Enhancing the usage of the Shared Class Cache
Enhancing the usage of the Shared Class Cache
by Devarghya Bhattacharya, With the increasing popularity of the Java language and sandboxed environments, research needs to be conducted into improving the performance of these environments by decreasing the execution time as well as the memory footprint of an application. This thesis examines various critical data structures, used by IBM's Java Virtual Machine (JVM) during the start-up phase, for potential improvements. These data structures start small and expand as required in order to save space, however, growing them slows down the start-up of the JVM. This thesis will describe how the data structures were optimized using the Shared Class Cache (SCC), in order to improve the execution time as well as the memory footprint of the application running on IBM's JVM. The impact of this approach on performance and memory has been evaluated using different benchmarks. On average, a performance increase of 6% and a memory reduction of about 1% has been achieved with this approach. The alterations made are completely automated and the user requires no prior knowledge about the Java application or the VM to improve the performance of the deployed application. The only task the user has, is to activate the SCC., M.C.S. University of New Brunswick, Faculty of Computer Science, 2017.

Pages

Zircon - This is a contributing Drupal Theme
Design by WeebPal.