Analyzing the bias in public opinion and media reports for events in different geographical regions

Data mining is extraction of hidden, useful and precious data from large and high dimensional databases. This field is gaining significant recognition, particularly in medical domain, due to the availability of large amount of data, easily collected and stored via computer systems. Lung cancer patients undergo thoracic surgery which is a highly risky procedure. The prediction of the likelihood that the patient will live after surgery is a very crucial step in post-operative risk management domain. Medical specialists need strong evidence to decide whether the patient should be suggested for surgery or not. The aim of this research is to improve the performance of prediction of likelihood that whether patient will live after surgery by employing various data mining techniques. In order to predict the probability of living after surgery, a dataset of patients has been prepared and made available at UCI Machine Learning website, named as “Thoracic Surgery Dataset”. This dataset has been used in various studies and various prediction/learning approaches have been applied. However, the major reason behind the low performance of existing approaches, as per our analysis, is the class imbalance problem. We aim to address this problem by employing pre-processing steps at first stage and then improve classification performance by using various ensemble classifiers. As a preprocessing step, data undergoes feature extraction and sampling process and then a stacked generalization technique is used to combine results of suitable machine learning baseline algorithms.

News media is considered as voice of public which represent opinion of people. Sometimes it is observed that news media is biased in case of some certain topic. Now a day’s people use social media as a major source for presenting their opinion. There is difference of opinion presented on social media and national news media. There is a third source which is responsible for forming opinion of other countries about certain event that is international news media.

            In this research we have used these three sources to detect biasness of opinion. For public opinion we have used twitter. Twitter is widely used by public for the expression of sentiment. Tweets of Pakistani region are collected to detect opinion of Pakistani public. For national news media two major sources are used. These sources are “The nation” and “Dawn”. These are most popular English newspaper in the country.  For the collection of international news “Reuters”, and “Aljazeera” are used. After the collection of dataset topical keyphrases are extracted and sentiment analysis is performed. Then news articles and tweets sentiments are compared in order to classify the news article as biased or unbiased. Major finding of this research is that is case of politics related topics national as well as international news media is totally biased. In case of general issues of society media present true sentiment of public.              Web application biasness detector is developed for this research. Results produced by this application are evaluated. Evaluation is done manually as there is no existing application with which we could compare result.