Sentiment Analysis On Complex Sentences Of Urdu With Negations Using Deep Learning

Natural Language Processing is a growing field of Artificial Intelligence and used for interaction between computers and humans. In NLP, negation is of great importance as it changes the polarity of a sentence. Recognition of Cue and scope during negation detection is an important aspect. Research work has been reported in Negation Detection for English language, e.g. in biomedical domain, “Bioscope Corpus” is a corpus of Biomedical events annotated with negation cues and scopes. There is no such research done in Urdu and negation detection is difficult due to Urdu’s morphologically rich structure. In this thesis, a corpus has been created using BBC Urdu News articles. Using the guidelines for annotation of BioScope corpus, further rules are devised which are suitable for Urdu and applied on BBC Urdu corpus. Corpus comprises of 1600 sentences, belonging to four domains (politics, sports, ). Different types of negation cues are extracted from corpus, which are: Single, Multiple and prefixes. Annotation has been carried out by 3 domain experts and inter-annotator agreement has been applied through Kappa. The annotated corpus is then used to devise a machine learning based method using Condition Random Fields (CRF) to detect “cue” and “scope” automatically. This system detected negation cue with 100% precision, 94% recall and 96% F-measure; whereas scope is detected with 75% precision, 81% recall and 77% F-measures. We further investigated the effect of automatically detected negation on sentence level Sentiment Analysis. For this purpose, we performed Sentiment Analysis on BBC Urdu News Corpus with and without using negation. Experiments showed an increase to 82.6% accuracy with using negation as compared to 76.4% without negation detection.