Detection of Fake Apps Through Hybrid Machine Learning and NLP techniques

Now a days everyone is using the smart phones, there is a need of various applications to be installed on the smart phones. So, there are 3.48 million apps on google play store. Apps demand is increasing day by day. Apps developers want to make their app in top of the leader board so that everyone will be able to discover their app and earn revenue. In between there are some peoples who develop fake apps with fake promises, which are harmful for the user in terms of privacy, security, viruses, and wastage of time. Google play protected is developed by Google, it looks for user feedback, no of install and uninstall apps and different permissions asked from user. GPP tells user that app is fake or real. Even Sometimes Google play protected cannot identify these fake and malicious apps, Because GPP didn’t find co relation between listing description of apps and the permissions that are asked from user So, in this research a novel model is defined that fake app detection is concatenating various features such as reviews, ranking, ratings, different Permissions, and no of install etc. Then machine learning classifier will predict for malicious and malware apps. So, this will be very useful in order to identify these fake apps which can cause serious damage to the user in terms of privacy, security, viruses, and wastage of time, because time is money. There is lot of fraud happening in play store market like Internet water army are the group of Internet ghost writers who are paid to post the online comments through fake accounts which deceives the user easily, and as a result the user will install the apps on the basis of fake comments. So, it is very necessary to identify fake or real apps. This Paper proposes a new Architecture based on Co-relation between these features. Various feature extraction techniques like TFIDF, and word embeddings including Glove and along with machine and deep learning models have been applied. Evaluation metrics as Accuracy Precision, and Recall have been used and made through Confusion Matrix. So, to identify these fake apps, Experiments will be performed using BERT, LSTM and BILSTM plus some existing machine learning classifiers like Random Forest, Decision tree and Support vector machine KNN and LR. Because these classifiers are good in term of accuracy and results Support Vector Machines (SVM) worked well with binary data from machine learning classifiers The Experimental results shows that the proposed model has shown the Accuracy with BERT 93%, LSTM 91%, BILSTM 93% and other traditional machine learning Algorithms with the Accuracy of 70 to 80%. Overall Model has Promising results. Our Deep Learning method have achieved better performance as compared with several baseline old approaches.