Depression Detection using Machine Learning Technique

In today’s world, mental health diseases have become highly prevalent, and depression is one of the mental health problems that has become widespread. According to WHO reports, depression is the second-leading cause of the global burden of diseases. In the proliferation of such issues, social media has proven to be a great platform for people to express themselves. Thus, a user’s social media can speak a great deal about his/her emotional state and mental health. Considering the high pervasiveness of the disease, this paper presents a novel framework for depression detection from textual data, employing Natural Language Processing and deep learning techniques. For this purpose, a dataset consisting of tweets was created, which were then manually annotated by the domain experts to capture the implicit and explicit depression context. Two variations of the dataset were created, on having binary and one ternary labels, respectively. Ultimately, a deep-learning-based hybrid Sequence, Semantic, Context Learning (SSCL) classification framework with a self-attention mechanism is proposed that utilizes GloVe (pre-trained word embeddings) for feature extraction; LSTM and CNN were used to capture the sequence and semantics of tweets; finally, the GRUs and self-attention mechanism were used, which focus on contextual and implicit information in the tweets. The framework outperformed the existing techniques in detecting the explicit and implicit context, with an accuracy of 97.4 for binary labeled data and 82.9 for ternary labeled data. We further tested our proposed SSCL framework on unseen data (random tweets), for which an F1-score of 94.4 was achieved. Furthermore, in order to showcase the strengths of the proposed framework, we validated it on the “News Headline Data set” for sarcasm detection, considering a dataset from a different domain. It also outmatched the performance of existing techniques in cross-domain validation.
In today’s world, mental health diseases have become highly prevalent, and depression is one of the mental health problems that has become widespread. According to WHO reports, depression is the second-leading cause of the global burden of diseases. In the proliferation of such issues, social media has proven to be a great platform for people to express themselves. Thus, a user’s social media can speak a great deal about his/her emotional state and mental health. Considering the high pervasiveness of the disease, this paper presents a novel framework for depression detection from textual data, employing Natural Language Processing and deep learning techniques. For this purpose, a dataset consisting of tweets was created, which were then manually annotated by the domain experts to capture the implicit and explicit depression context. Two variations of the dataset were created, on having binary and one ternary labels, respectively. Ultimately, a deep-learning-based hybrid Sequence, Semantic, Context Learning (SSCL) classification framework with a self-attention mechanism is proposed that utilizes GloVe (pre-trained word embeddings) for feature extraction; LSTM and CNN were used to capture the sequence and semantics of tweets; finally, the GRUs and self-attention mechanism were used, which focus on contextual and implicit information in the tweets. The framework outperformed the existing techniques in detecting the explicit and implicit context, with an accuracy of 97.4 for binary labeled data and 82.9 for ternary labeled data. We further tested our proposed SSCL framework on unseen data (random tweets), for which an F1-score of 94.4 was achieved. Furthermore, in order to showcase the strengths of the proposed framework, we validated it on the “News Headline Data set” for sarcasm detection, considering a dataset from a different domain. It also outmatched the performance of existing techniques in cross-domain validation.

Published: https://www.mdpi.com/1424-8220/22/24/9775