A sarcasm detection framework in Twitter and blog posts based on varied range of feature sets
University of New Brunswick
This thesis addresses the problem of sarcasm detection by using a framework which is designed to effectively detect sarcastic blog and microblog posts. This framework consists of two components. Each component consists of different sub components including crawler, preprocessing and classification. The long text sarcasm detection classification consists of a two-step process, in each step, we use some feature sets along with different classifiers. These feature sets are utilized to analyze each blog post as a whole in addition to every isolated sentence. In the first step, Scoring Component is used to classify the documents into groups of sarcastic and non-sarcastic. Also in order to find sarcastic sentences in each sarcastic document, Decision Tree is applied. Considering the difficulties in sarcasm detection, the Document Level Sarcasm Detection achieved an outstanding result: 75.7% Precision rate. In the Short Text, Decision Tree is applied in order to classify the tweet texts into groups of sarcastic and non-sarcastic. Precision of 86.6% is obtained for this component which is very good considering the difficulty of sarcasm detection as well as inherent complexity of Twitter texts.