Crime Forecasting Using Data Mining Techniques

The major challenge faced by Law Enforcement agencies is quickly and efficiently analyzing voluminous amount of data for proper investigation process. Crime affects our society dearly in many different ways and to control such nuisance a system is required that analyze data proficiently, determine hidden patterns in data and perform crime forecasting. Our research makes use of data mining techniques, such as; association mining, clustering, classification and regression to find hidden trends in data and predict crime. Association mining build rules on the basis of prominent trend or most recurring pattern found in data set. Clustering help predict crime by forming clusters of data and then classifying them to number of classes based on similarity or dissimilarity between crime incidents. Classification and regression is used for predicting events.

In our research we followed divide and conquer approach by dividing problem into several analysis units and by solving them we defined methodology for crime prediction. Few important patterns (holidays v/s weekdays, Lunar v/s non-lunar days and major road networks) were analyzed by rule building, spatial and temporal clustering (clustering and outlier analysis, heat maps, hotspot analysis) was performed to identify areas of statistical importance in terms of crime and what are the major areas in Rawalpindi city which are highly crime dense. Lastly crime prediction was performed by regression and classification to predict crime (count, spectrum). MAPE lowest of 30% was achieved by linear regression and accuracy of above 80% was achieved in case of every classifier in predicting spectrums of crime on monthly data. Regression though gave better results with quarterly data. Important finding was, how socioeconomic information when added to dataset affect crime prediction and made results meaningful. Furthermore, ranking method for testing socioeconomic features was used to identify which attributes are directly linked to crime prediction and which are irrelevant.