Identifying source printers of documents using Image Processing & Machine Learning

Different types of printed documents are used in our day to day activities. Those documents include financial, legal, identity, currency and other types of sensitive documents. Printing has become very convenient and can be done at a low cost and thus makes forging documents quite easy. This has lead to forgery cases to go on the rise in recent times. Law enforcement agencies can gain a lot of sophisticated source printer identification techniques in order to facilitate with their investigations. Because of these reasons, efficient and simple printer detection techniques are needed to be created. An important aspect in forgery detection using digital investigation is to ascertain the source of a printed document. Clues about type, brand or model of the printing machine could help in distinguishing the forged documents from huge volumes of printed documents like in the case of counterfeit currencies. We use a publicly available dataset of different printers with different brands and models and test our approach on that. We extract such clues from documents printed from different printers using image processing techniques and use them to link those documents to their source printer. This research uses techniques from image processing and data classification to accurately perform the previously mentioned task. Those techniques would involve feature extraction from documents and using those features to train classifiers to determine which document belongs to which printer. We explore deep learning on our data which has never been performed before on this problem and then we apply traditional features on the same data and compare the results to determine whether deep learning is suited to this problem or not. With the help of this research, we can see that deep learning may not perform as well as other features that are built to detect texture but it still has potential which can be seen from the decent performance of neural networks on this task. Neural nets that may be specifically trained for this task could possibly outperform traditional approaches.