Enhancing Dysarthria Diagnosis: Leveraging Deep Learning Techniques with the TORGO Dataset

Dysarthria is a motor speech disorder that occurs when the muscles involved in speech production. are weakened or not properly coordinated, often due to neurological conditions such as stroke, cerebral palsy, or other diseases. Early detection is critical for timely intervention and treatment planning to improve the quality of life for those experiencing dysarthria. Traditional methods for assessing speech impairment, such as subjective evaluations and conventional acoustic analyses, are often time-consuming, biased, and less efficient. Advancements in machine learning, particularly deep learning techniques like Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks, offer new opportunities for the automatic detection of dysarthria based on speech signals. However, challenges such as high-dimensional data, overfitting, and data scarcity remain. This thesis presents a novel deep learning model for the automatic detection of dysarthria in speech data. The model combines a SincNet layer, which uses band-pass filters based on the sinc function to extract audio features, with CNN and LSTM layers to capture spatial and temporal dynamics in speech signals. By integrating these components, the proposed model aims to learn features from raw audio and effectively handle sequential data. The study’s objectives include developing and evaluating the proposed model for dysarthria detection, comparing its performance with existing models, and examining factors contributing to its success. Additionally, the model’s robustness and generalization capabilities are tested on publicly available TORGO datasets and achieved an overall accuracy of 99%.