Detecting hate speech is indispensable irrespective of the scale of use of language as it inflicts huge harm on society. This work presents a first resource for classifying the severity of hate speech besides classifying offensive and hate speech content. Current research mostly limits hate speech classification only to its primary categories such as racism, sexism, religious hate, etc. However, hate speech targeted at different protected characteristics also manifests in different forms and intensities. It is important to understand varying severity levels of hate speech so that the most harmful cases of the hate speech may be identified and dealt earlier than the less harmful ones. In this work, we focus on detecting offensive speech, hate speech and multiple levels of hate speech in Urdu language. We investigate three primary target categories of hate speech namely: religion, racism, and national origin. We further divide these hate speech categories into levels based on the severity of hate conveyed. The severity levels are named as symbolization, insult and attribution. A corpus comprising more than 20 thousand tweets against the corresponding hate speech categories and severity levels is collected and annotated. A comprehensive experimentation scheme is applied using traditional as well as deep learning based models to examine their impact on hate speech detection. The highest macro-averaged F-scores yielded for detecting offensive speech and hate speech with respect to ethnicity, national origin and religious affiliation are 86%, 80%, 81% and 72% respectively. This shows that results are very encouraging and would provide a lead towards further investigation is this domain.
Publication: https://dl.acm.org/doi/abs/10.1145/3580476