Detecting Semantically Similar Sentences Using Deep Learning Techniques

In this research, we present a novel approach to enhance Text Classification task using improved word representations that have sentence embeddings. Word2vec is the vector based word representation, can be manipulated to work in conjunction with vector based sentence representations to improve the accuracy of classifier in text classification task. we will demonstrate how distributed representations of words could be used for sentence classification by jointly learning both representations for detecting semantic relatedness between sentences. Distributed representation of words exhibit semantic association between words however very less effective amount of work done on sentence classification. Findings also suggests that CBOW model works better on semantic tasks as opposed to the skipgram. The primary objective of this thesis is to surface out semantic relatedness between questions’ pairs. In this thesis a deep neural network Multi-Layer Perceptron is used for sentence encoding and multiple distance and similarity measures are used for similarity estimation. Different experiments are performed to find out sentence vectors. This is then used to classify questions as similar or dissimilar, which asks same intent but apparently looks different because of different wordings. It is a challenging task because in Online User Forums it is crucial to gaurantee that each question can occur only once so that it can be answered once and improves the quality of knowledge foundations. This saves the time of the experts because they dont’t have to answer same questions again and again. To date word2vec achieve state of the art performance in natural language processing tasks however it is still inadequate to consider sentence classification in it. To incorporate such information we primarily focus on to collaborate both semantic information of both words and sentences to produce high quality vectors. The resulting word vectors reportedly shows effective improvements over baseline techniques. We summarize our best published results on the famous publically available benchmark dataset of text classification. The experimental results and quantitative comparison with other modern techniques is provided to demonstrate the importance of subject techniques.