University of New Brunswick
The wide use of semantic information technologies helps to extract information from vast data available in electronic documents from industrial as well as academic background. The structures of documents are very diverse; therefore, document extraction, processing and interpretation are very difficult procedures. There are many software tools available for these purposes, but they are very domain specific or scenario specific. Therefore, it is impossible to use these software tools for universal purposes and hence, it is essential to develop a tool that is very specific to the problem of information extraction from electronic documents and evaluation. The present work is part of the final goal to create the “Research Map” of New Brunswick, which has meaningful information about the researchers and their domain of work from industrial and academic background. The purpose of the system is to identify research collaboration, which can indirectly lead to opportunities for funding. Accomplishment of the goal requires development of an application specific information extractor, classification algorithm and methods required to search information for the “Research Map”. The work is inspired by the New Brunswick Innovation Foundation. The present work involves classification of research proposals, extracting keywords based on their domain, and then comparing the same section of two research proposals by implementing comparison algorithm using TF-IDF and cosine similarity. The present work also implements the GUI using Java Swing to facilitate the ease of using the developed system.