登录 EN

添加临时用户

大规模帕金森病汉语语音数据集构建及机器学习分类研究

Construction of a large-scale Chinese speech dataset for Parkinson’s disease and machine learning-based classification

作者:方皓
  • 学号
    2018******
  • 学位
    硕士
  • 电子邮箱
    fan******com
  • 答辩日期
    2023.09.05
  • 导师
    李路明
  • 学科名
    航空宇航科学与技术
  • 页码
    53
  • 保密级别
    公开
  • 培养单位
    031 航院
  • 中文关键词
    帕金森病,运动减弱型构音障碍,汉语语音,大规模数据集,机器学习
  • 英文关键词
    Parkinson’s Disease, Hypokinetic Dysarthria, Chinese Speech, Large-scale Dataset, Machine Learning

摘要

帕金森病是一种常见的神经退行性疾病,其发病机制尚不明确,也缺乏彻底治疗的方法。大量证据表明,如果患者能够被尽早确诊,就有机会在疾病早期进行干预治疗从而延缓疾病的进展。构音障碍出现在约90%的帕金森病患者身上,其特征为持续性的运动范围减少,因此被定义为“运动减弱型构音障碍”。构音障碍可能是最早出现的帕金森病症状之一,使得语音信号有潜力作为帕金森病早期诊断的生物标志物。目前已有许多非汉语语音研究成功实现了基于语音的帕金森患者识别,但尚没有公开可用的帕金森病汉语语音数据集,基于汉语语音的相关研究仍处于起步阶段,多数研究存在受试规模较小、对照组匹配程度不高、语音任务设计过于简单的问题。 针对上述问题,本文设计了一套包含全部常用声母和韵母、四声调基本均匀、在日常生活中被高频使用的标准语音任务,并基于该文本库设计了一套语音采集范式。为确保受试者的语音信号能够反映我国多样化的口音情况,我们通过电话录音的方式进行远程语音信号采集,招募了来自全国30个省/市/自治区的1200名受试者,并确保了帕金森病患者组和健康对照组的年龄和性别相互匹配。经过数据处理和标注,我们构建了一个包含1035名受试者的25893段语音数据的大规模帕金森病汉语语音数据集,总数据时长为48.76小时,同时包含了受试者的人口学信息、临床信息和构音障碍信息。基于该数据集,我们对帕金森病言语障碍进行了相关统计学分析,发现帕金森患者构音障碍的严重程度与其是否伴有步态障碍高度相关,并基于系统聚类方法提出了一种更适用于帕金森病患者的构音障碍主观评价量表VHI-10-PD。最后,我们通过特征提取、特征筛选和机器学习算法训练三个步骤,构建了基于语音信号的帕金森病患者分类算法,在所采集的数据集上进行了测试,单条语音样本的分类准确率最高达到66.8%,通过多条语音样本综合得到的受试者分类准确率最高达到74.5%。在我们设计的5种语音任务中,自由发言的分类准确率最高,说明帕金森患者在日常性的交流中会不自觉地暴露出构音障碍,而当进行阅读任务时会有意识地改变自己的发音方式以使自己的声音听起来更正常。我们的研究工作可以作为帕金森病汉语语音表征研究的基础,有助于开发面向中国帕金森病患者的早期诊断系统,并能为设计患者声音训练方案提供建议。

Parkinson‘s disease (PD) is a common prevalent neurodegenerative disorder with unclear pathogenesis and no definitive therapeutic methods. Substantial evidence suggests that early diagnosis offers the potential for intervention and treatment during the early stages of the disease, thereby slowing its progression. Articulatory disorders occur in approximately 90% of PD patients and are characterized by persistent reductions in vocal movement, thus termed "hypokinetic dysarthria". Hypokinetic dysarthria may manifest as one of the earliest symptoms of PD, indicating the potential for speech signals to serve as biomarkers for early PD diagnosis. While numerous studies have successfully identified PD patients using non-Chinese speech, there is currently no publicly available PD Chinese speech dataset. The research focused on Chinese speech remains nascent, often suffering from tiny sample sizes, inadequate control group matching, and overly simplistic speech task designs.Regarding the above issue, this study devised a comprehensive set of standard speech tasks incorporating commonly used initials, finals, evenly distributed tones, and high-frequency language patterns. A speech collection paradigm was developed based on this task set. Remote voice recordings via telephone were employed to ensure linguistic diversity, recruiting 1200 participants from 30 provinces/municipalities/autonomous regions across China. Age and gender matching was ensured between the PD patient and healthy control groups. Following data processing and annotation, a large-scale PD Chinese speech dataset was compiled, comprising 25,893 speech segments from 1035 participants. The total duration of the dataset was 48.76 hours and included demographic, clinical, and articulatory disorder information. Statistical analyses of PD speech impairments were conducted using this dataset, revealing a strong correlation between the severity of articulatory disorders in PD patients and the presence of gait disturbances. A more suitable subjective evaluation scale for PD articulatory disorders, VHI-10-PD, was proposed based on hierarchical clustering. Finally, a PD patient classification algorithm based on speech signals was developed through feature extraction, feature selection, and machine learning training. Testing on the collected dataset resulted in a maximum accuracy of 66.8% for single speech samples and a maximum accuracy of 74.5% for comprehensive participant classification based on multiple speech samples. Free speech exhibited the highest classification accuracy among the five speech tasks designed, indicating that PD patients unconsciously reveal articulatory disorders in everyday communication. At the same time, they consciously alter their pronunciation during reading tasks to sound more normative. This study is a foundational contribution to the characterization of PD Chinese speech, facilitating the development of early diagnosis systems for Chinese PD patients and offering suggestions for patient voice training programs.