登录 EN

添加临时用户

面向语音合成的汉语语音语料库建设与分析

Building and Analysis of Mandarin Speech Corpus for TTS

作者:崔丹丹
  • 学号
    0702******
  • 学位
    硕士
  • 答辩日期
    2006.03.01
  • 导师
    蔡莲红
  • 学科名
    计算机科学与技术(可授工学、理学学位)
  • 页码
    75
  • 保密级别
    公开
  • 馆藏号
    07024198
  • 培养单位
    024 计算机系
  • 中文关键词
    语料库设计;语料库标注; 语境特征
  • 英文关键词
    Corpus Design;Corpus Annotation;Context Features

摘要

本论文详细叙述了近三年来在汉语普通话文语转换( Text to Speech, TTS)系统中语料库建设和分析方面的主要工作。语音合成语料库是基于波形拼接的汉语语音合成系统的实现基础,担负着提供波形拼接所需数据和指导拼接算法所需韵律知识的双重任务。有效合理地进行语音合成语料库的设计,是保证拼接合成系统性能的关键技术之一。论文将语料库设计分总体设计、文本设计、标注设计和组织结构设计四步进行,每一步均制订了严格的规范。论文设计出了语料库文本,包括TTS系统建库用语料、TTS系统测试用语料、特殊音节组和普通话语调分析用语料四部分文本,经过录音、数据预处理、标注、组织与分发,最终完成了语音合成语料库TH-CoSS的设计与制作。完成的TH-CoSS语料库由男女发音人各一名共逾2万个句子组成,经严格校对,具有符合国际通用的XML扩展标记语言规范的标注系统,标记了韵律层级和音段信息。同时,论文还开发了用于语料库设计、标注、分析的工具软件。研究了语境特征在音节聚类中的作用,分析了语境特征与语音感知表现之间的关联。论文利用CART决策树算法,对TH-CoSS语料库音节的韵律参数进行聚类,分析语境特征的分布:出现率、平均层级。为了评价语境特征对语音韵律表现的影响,设计了一个衡量语境特征重要性的权重函数。该函数对语料库文本设计和TTS系统选音参数的权重设定具有较高的参考价值。为建立和谐的人机交互环境,计算机需要具有理解情感和表达情感的能力。论文研究了韵律特征对情感区分和情感表现的影响。论文首先收集了情感语音数据,计算了情感语音的声学参数。为了研究感知韵律特征与情感表现的关系,设计实现了一个语音情感编辑器,它具有编辑、修改语音韵律参数的功能,通过韵律修改不同的情感表现。

This paper states our work in recent three years, which focuses on the topic of building and analysis of corpus for Chinese Mandarin Text-to-Speech System. Synthesis corpus is the basis for concatenate-based synthesis system. It provides the TTS system speech data, as well as prosodic knowledge to instruct the unit selection algorithm. We divided the corpus design into four modules: general design, text selection, annotation design, and structure and organization design. The text material consists of four parts: sentences for TTS system building, sentences for TTS system test, special syllable groups, and sentences with special intonation. And the implementation steps are speech recording, data pretreatment, annotation, organization, and publication. The finished corpus has about 20K sentences read by one female and one male. The intonation files are in XML format, including segmental and prosodic tags. All the tags are manually checked. Software tools for corpus design, annotation and analysis are developed as well.It can provide a reference for building the relation between context information and speech perception to study on the influence of context features during the procedure of syllable clustering by prosodic parameters. We uses a CART to cluster the syllables in the TH-CoSS corpus by their prosodic parameters, analyses the distribution of context features (appearance rates and average levels), and proposes an importance function of context features to evaluate their weights of influence on the prosody of speech, which shows to be a valuable reference to both text script design of speech corpus and weight setting in TTS unit selection.To make the human-computer interaction more natural, the computer should have the capability to understand and express emotions. To analyze the function of prosodic features in emotion discrimination and expression, we collected emotional speech corpus, and analyzed the acoustic parameters. For the convenience of research on the relation between prosody and emotion, we developed an emotional speech editor, which can edit and modify the prosodic parameters of speech to achieve different emotions.