Faculty of Computer Science (Fredericton)

Pages

A multi-sense context-agnostic definition generation model evaluated on multiple languages
A multi-sense context-agnostic definition generation model evaluated on multiple languages
by Arman Kabiri, Definition modeling is a recently-introduced task in natural language processing (NLP) which aims to predict and generate dictionary-style definitions for any given word. Most prior work on definition modelling has not accounted for polysemy — i.e. a linguistic phenomenon in which a word can imply multiple meanings when used in various contexts — or has done so by considering definition modelling for a target word in a given context. In contrast, in this study, we propose a context-agnostic approach to definition modelling, based on multi-sense word embeddings, that is capable of generating multiple definitions for a target word. In further contrast to most prior work, which has primarily focused on English, we evaluate our proposed approach on fifteen different datasets covering nine languages from several language families. To evaluate our approach we consider several variations of BLEU — i.e., a widely-used evaluation metric initially introduced for machine translation that is adapted to definition modeling. Our results demonstrate that our proposed multisense model outperforms a single-sense model on all fifteen datasets.
A parallel integrated index for spatio-temporal textual search using Tries
A parallel integrated index for spatio-temporal textual search using Tries
by Yoann S. M. Arseneau, The proliferation of location-enabled devices and the increasing use of social media platforms is producing a deluge of multi-dimensional data. Novel index structures are needed to efficiently process massive amounts of geotagged data, and to promptly answer queries with textual, spatial, and temporal components. Existing approaches to spatio-textual data processing either use separate spatial and textual indices, or a combined index that integrates an inverted index with a tree data structure, such as an R-tree or Quadtree. These approaches, however, do not integrate temporal, spatial, and textual data together. We propose a novel integrated index called Spatio-temporal Textual Interleaved Trie (STILT), which unifies spatial, textual, and temporal components within a single structure. STILT is a multi-dimensional binary-trie-based index that interleaves text, location, and time data in a space-efficient manner. It supports dynamic and parallel indexing as well as concurrent searching. With extensive evaluation we demonstrate that STILT is significantly faster than the state-of-the-art approaches in terms of index construction time and query latency.
A profiling tool for exploiting the use of packed objects in java programs
A profiling tool for exploiting the use of packed objects in java programs
by Umang Umesh Pandya, Packed objects is an experimental feature in the IBM Jg Virtual Machine. Packed objects can be used to gain greater control over the layout of objects in memory. Applications which use packed objects have greater flexibility when they work with memory structures that are not in Java code. The purpose of developing this profiling tool is to assist software developers in determining how much performance and/ or memory gain they can achieve if they switch to packed objects from standard Java objects. This switch can result in reduced memory consumption by the application since the use of packed objects in Java applications can reduce the amount of memory required to store objects and it also increases the efficiency of caching. If the application is accessing a lot of native data using JNI method calls, then use of the packed objects will eliminate marshaling/unmarshaling of native data into Java objects and thus eliminating redundant data copying, which is a chief limitation of existing methods used to access native data in Java applications such as the JNI API, and the Java New I/0 APL The correctness and usefulness of the profiling tool is evaluated by running several test programs and benchmarks. Based on the results generated by the profiling tool for the benchmarks, the programs were modified to use packed objects instead of standard Java objects, and a performance gain was noticed in terms of reduced memory consumption by the program that allocates large array objects of non-primitive data types and by the program that uses the JNI functions to access arrays of primitive data types from the native side.
A reinforcement learning approach to dynamic norm generation
A reinforcement learning approach to dynamic norm generation
by Hadi Hosseini, This thesis proposes a two-level learning framework for dynamic norm generation. This framework uses the Bayesian reinforcement learning technique to extract behavioral norms and domain-dependent knowledge in a certain environment and later incorporates them into the learning agents in different settings. Reinforcement learning (RL) and norms are mutually beneficial: norms can be extracted through RL, and RL can be improved by incorporating behavioral norms as prior probability distributions into learning agents. An agent should be confident about its beliefs in order to generalize them and use them in future settings. The confidence level is developed by checking two conditions: how familiar the agent is with the current world and its dynamics (including the norm system), and whether it has converged to an optimal policy. A Bayesian dynamic programming technique is implemented and then compared to other methods such as Q-learning and Dyna. It is shown that Bayesian RL outperforms other techniques in finding the best equilibrium for the exploration-exploitation problem. This thesis demonstrates how an agent can extract behavioral norms and adapt its beliefs based on the domain knowledge it has acquired through the learning process. Scenarios with different percentages of similarity and goals are examined. The experimental results show that the normative agent, having been trained in an initial environment, is able to adjust its beliefs about the dynamics and behavioral norms in a new environment, and thus it converges to the optimal policy more quickly, especially in the early stages of learning., (UNB accession number) Thesis 8607. (OCoLC)821799515, M.C.S. University of New Brunswick, Faculty of Computer Science, 2010
A sarcasm detection framework in Twitter and blog posts based on varied range of feature sets
A sarcasm detection framework in Twitter and blog posts based on varied range of feature sets
by Hamed Minaee, This thesis addresses the problem of sarcasm detection by using a framework which is designed to effectively detect sarcastic blog and microblog posts. This framework consists of two components. Each component consists of different sub components including crawler, preprocessing and classification. The long text sarcasm detection classification consists of a two-step process, in each step, we use some feature sets along with different classifiers. These feature sets are utilized to analyze each blog post as a whole in addition to every isolated sentence. In the first step, Scoring Component is used to classify the documents into groups of sarcastic and non-sarcastic. Also in order to find sarcastic sentences in each sarcastic document, Decision Tree is applied. Considering the difficulties in sarcasm detection, the Document Level Sarcasm Detection achieved an outstanding result: 75.7% Precision rate. In the Short Text, Decision Tree is applied in order to classify the tweet texts into groups of sarcastic and non-sarcastic. Precision of 86.6% is obtained for this component which is very good considering the difficulty of sarcasm detection as well as inherent complexity of Twitter texts.
A semantic matchmaking system for online dating
A semantic matchmaking system for online dating
by Emily Wilson, The popularity of the online dating industry has grown immensely over the past decade. There is an abundance of online dating websites with various features to attract users. The Semantic Web is a major endeavor that aims to have information on the web be not only machine-readable but also machine understandable. Online dating is a good candidate for such a service since it is based primarily on user provided information. By organizing user information in a comprehensive knowledge base users can be matched more efficiently. In this thesis, semantic web tools, such as ontology languages and reasoning software, were investigated to determine which ones would work best in the online dating website model. An ontology is presented that models the properties of user profiles on a dating website, as well as a semantic matching system. Rules and reasoning are used to infer additional facts about users to be used in the matching process, therefore providing a more accurate match. This prototype, a semantic matchmaking system for online dating, has been implemented in Java using the Jena interface. Results of running the prototype on a commercial dating website are reported., Electronic Only. (UNB thesis number) Thesis 9205. (OCoLC) 960909545, M.C.S., University of New Brunswick, Faculty of Computer Science, 2013.
A sentiment analysis framework for social issues
A sentiment analysis framework for social issues
by Mostafa Karamibekr, Sentiment analysis investigates attitudes, feelings, and expressed opinions regarding products, services, topics, or issues. Subjectivity classification that categorizes text as objective or subjective is an application of sentiment analysis. Sentiment classification, as another application, categorizes the polarity of opinion mostly as positive or negative. This research focuses on the sentiment analysis of social issues. We have conducted a research that statistically shows that the affective factors on opinions in product domains are different from those in social domains. Based on the findings of this research, a framework is proposed for sentiment analysis of social issues. This framework considers the role of verb in sentiment and defines a quadruple structure for opinion that consists of opinion author, opinion target, opinion expression, and opinion time. One of benefits of the proposed framework is that it extracts expressed opinions that can be used for various applications such as subjectivity classification, sentiment polarity classification, sentiment summarization, sentiment visualization, and sentiment comparison. We have evaluated the performance of our proposed framework for sentiment analysis of public comments regarding abortion as a social issue. We have implemented two applications of sentiment analysis: subjectivity classification and polarity classification., (UNB thesis number) Thesis 9565. (OCoLC)963942354. Electronic Only., Ph.D. University of New Brunswick, Faculty of Computer Science, 2015.
A simplex-cut method for nearest facets in Minkowski polytopes
A simplex-cut method for nearest facets in Minkowski polytopes
by Zhan Gao, The Support Vector Machine algorithms are well known machine learning algorithms focused on classification and regression. The main idea of an SVM problem is to find a function to separate two data sets with a maximum margin. In this thesis, we focus on solving the linear SVM problem where the training sets cannot be separated by a linear function. We follow Cui’s thesis [6] which converts the problem into a Minkowski Norm Minimization problem. We first introduce the basic formulation of SVM problem and some theory. We also briefly introduce Cui’s geometric framework and show how the SVM problem can be view geometrically. In order to solve the Minkowski Norm Minimization problem, we propose to algorithms two trim the polytope. The implementation uses lrs [1] for enumeration and cdd [8] for linear programming; gmplib [7] is used for multi-precision calculation. The results of the experiments show that our methods have better performance than brute-force enumeration and that the cutting polytope method has better performance than cutting planes., Electronic Only. (UNB thesis number) Thesis 9389. (OCoLC) 961810521., M.C.S., University of New Brunswick, Faculty of Computer Science, 2014.
A software toolkit for stock data analysis using social network analysis approach
A software toolkit for stock data analysis using social network analysis approach
by Junyan Zhang, In this work, we design an online analytical toolkit benefiting from the domain of Social Network Analysis. The objective is to provide a networkcentric perspective for analyzing stock data in facilitating portfolio management. The core process of this toolkit is to create a network of stocks from New York Stock Exchange (NYSE) and National Association of Securities Dealers Automated Quotations (NASDAQ). Each node in this network represents a stock and the weight of linked edges between any two stocks is decided by the correlation coefficient calculated based on the historical daily returns between the two stocks involved. With this network, there are several embedded functionalities designed for further analysis. Users can write their own scripts on top of this network, generate the specific portfolios, simulate the history trend with another index and visualize the result for comparison. The software architecture of the toolkit is a client-server architecture in which the user interface, functional process logic, data storage and access are developed and maintained as independent modules. This toolkit is evaluated through a case study on simulating the history trend of the Dow Jones Industrial Average (DJIA), along with multiple experimental scenarios tested on this toolkit for system performance evaluation. As an important observation from this case study, a careful selection of alternative stock portfolios based on network criteria shows similar trends with DJIA. While the latter is a portfolio constructed mainly based on the importance ( or size) of the constitutive stocks, our network-centric construction of alternative portfolios illustrates that the phenomenon of "too-connected-to-be-included" is as important as (if not more) "too-big-to-be-included". This new observation possesses great potentials in portfolio management by offering an alternative way of stocks selection: size matters, but connection may matter more!
Accelerating main memory query processing for data analytics
Accelerating main memory query processing for data analytics
by Puya Memarzia, Data analytics provides a way to understand and extract value from an ever-growing volume of data. The runtime of analytical queries is of critical importance, as fast results enhance decision making and improve user experience. Data analytics systems commonly utilize in-memory query processing techniques to achieve better throughput and lower latency. Although processing data that is already in the main memory is decidedly speedier than disk-based query processing, this approach is hindered by limited memory bandwidth and cache capacity, resulting in the under-utilization of processing resources. Furthermore, the characteristics of the hardware, data, and workload, can all play a major role in hindering execution time, and the best approach for a given application is not always clear. In this thesis, we address these issues by investigating ways to design more efficient algorithms and data structures. Our approach involves the systematic application of application-level and system-level refinements that improve algorithm efficiency and hardware utilization. In particular, we conduct a comprehensive study on the effects of dataset skew and shuffling on hash join algorithms. We significantly improve join runtimes on skewed datasets by modifying the algorithm’s underlying hash table. We then further improve performance by designing a novel hash table based on the concept of cuckoo hashing. Next, we present a six-dimensional analysis of in-memory aggregation that breaks down the variables that affect query runtime. As part of our evaluation, we investigate 13 different algorithms and data structures, including one that we specifically developed to excel at a popular query category. Based on our results, we produce a decision tree to help practitioners select the best approach based on aggregation workload characteristics. After that, we dissect the runtime impact of NUMA architectures on a wide variety of query workloads and present a methodology that can greatly improve query performance with minimal modifications to the source code. This approach involves systematically modifying the application’s thread placement, memory placement, and memory allocation, and reconfiguring the operating system. Lastly, we design a scalable query processing system that uses distributed in-memory data structures to store, index, and query spatio-temporal data, and demonstrate the efficiency of our system by comparing it against other data systems.

Pages

Zircon - This is a contributing Drupal Theme
Design by WeebPal.