A detection framework for android financial malware
University of New Brunswick
As attempts to thwart cybercrime have intensified, so have innovations in how cybercriminals provision their infrastructure to sustain their activities. Consequently, what motivates cybercriminals today has changed: from ego and status, to interest and money; the cybersecurity researchers today have turned their attention to financial-related malware, especially on the Android platform. There are some major issues faced by researchers in detecting Android financial malware. Firstly, what constitutes Android financial malware is still ambiguous. There is a disparity in labelling the type of malware where most of the current detection systems emphasize the recognition of generic malware types (e.g., Trojan, Worm) rather than indicating its capabilities (e.g., banking malware, ransomware). Without knowing what constitutes financial malware, the detection systems are not capable of providing an accurate recognition of an advanced and sophisticated financial-related malware. Secondly, most of the current anomaly-based detection systems via machine learning suffer from inaccurate evaluation and comparison due to the lack of adequate datasets, which result in unreliable outputs for real-world deployment. Due to time consuming processes, most of the available datasets are crafted mainly for static analysis, and those created for dynamic analysis are installed on an emulator or sandbox. Sophisticated malware can bypass these approaches; malware authors have employed obfuscation methods and included a wide range of anti-emulator techniques, where the malware programs attempt to hide their malicious activities by detecting the emulator. These deficiencies are some of the major reasons why Android financial malware is able to avoid detection. A comprehensive understanding of the existing Android financial malware attacks supported by a unified terminology and high-quality dataset is required for the deployment of reliable defence mechanisms against these attacks. Therefore, we seek to understand trends and relationships between Android malware families and devise a taxonomy of Android financial malware attacks. In addition, a systematic approach to generate the required datasets is presented to address the need to use physical platforms instead of emulators. In this regard, an automated dynamic analysis system running on smartphones is developed to generate the desired dataset in a testbed environment. In order to correlate the generated dataset and the proposed taxonomy, a hybrid framework for malware detection is presented. We propose a novel combination of both static and dynamic analysis based specifically on features derived from the string literal (statically via reverse engineering) and network flow (dynamically on smartphones). This combination can assist security analysts in recognizing the threats effectively. We employ five common classifiers to construct the best model to identify malware at four levels: detecting malicious Android apps, classifying Android apps with respect to malware category and sub-category, and characterizing Android apps according to malware family. Specifically, a dataset containing over 5,000 samples is used to evaluate the performance of the proposed method. The experimental results show that the proposed method with a Random Forest classifier achieves an accuracy of over 90% with a very low false positive rate of 4% on average.