Improved Predictive Models for Thoracic Surgery using Datamining Techniques

Data mining is extraction of hidden, useful and precious data from large and high dimensional databases. This field is gaining significant recognition, particularly in medical domain, due to the availability of large amount of data, easily collected and stored via computer systems. Lung cancer patients undergo thoracic surgery which is a highly risky procedure. The prediction of the likelihood that the patient will live after surgery is a very crucial step in post-operative risk management domain. Medical specialists need strong evidence to decide whether the patient should be suggested for surgery or not. The aim of this research is to improve the performance of prediction of likelihood that whether patient will live after surgery by employing various data mining techniques. In order to predict the probability of living after surgery, a dataset of patients has been prepared and made available at UCI Machine Learning website, named as “Thoracic Surgery Dataset”. This dataset has been used in various studies and various prediction/learning approaches have been applied. However, the major reason behind the low performance of existing approaches, as per our analysis, is the class imbalance problem. We aim to address this problem by employing pre-processing steps at first stage and then improve classification performance by using various ensemble classifiers. As a preprocessing step, data undergoes feature extraction and sampling process and then a stacked generalization technique is used to combine results of suitable machine learning baseline algorithms.