语音和听觉是人们交流、学习和理解事物的重要方式。然而,现实环境中,噪声和竞争语音总是不可避免的,严重影响了语音质量、听觉认知和感知。语音增强算法是众多语音应用中必不可少的部分。然而大部分算法很难同时提高客观语音质量和主观听觉感受。因此,有必要研究新的语音增强算法。与此同时,人们对噪声环境中听觉的心理和认知神经机制仍知之甚少,该研究不仅是神经科学和机器学习等领域的重点和热点,而且可以进一步指明语音增强算法的研究方向。本文首先在提出了一种基于对称波束形成、FastICA和耳蜗滤波器组稀疏编码的语音增强算法。该算法利用语音和噪声的空间域位置、时域统计特性和频域的听觉编码的差异,实现了语音增强。给出了全面评价语音增强算法客观和主观性能的四种指标。结果表明,改变输入信噪比、声源波达方向及噪声类型等,新算法的客观质量、主观性能和稳健性等均优于现有的主要的语音增强算法。在对称波束形成的基础上,本文结合分数阶延时滤波器、包络调制频谱和自适应滤波提出了第二种语音增强算法。很多语音增强算法都假设语音必须来自前方而噪声来自后方,该假设不符合现实声场。新算法不再受此限制。实验结果表明,无论语音是否来自前方,经第二种语音增强算法处理后的语音的波形和语谱图有所改善。该算法计算量小,收敛速度快,受步长和频率影响小,实用性强。频率跟随响应FFR是研究听觉的重要脑电信号,然而现有的研究FFR的方法常常受本底噪声干扰。本论文提出了分析FFR的幅度谱SNR和锁相值方法,并提出检测显著有效响应的方法。我们完成了不同复合音刺激下的两组FFR实验,计算了包络响应FFRENV和时域细节结构响应FFRCAR,得到了一系列听觉相关的结论:幅度谱SNR能去除本底噪声干扰;锁相值对FFR的敏感性好、计算量小;FFRENV和FFRCAR正交;声源的频率可分辨性决定了人对音高的神经元锁相能力;基频的FFRENV和其他谐波的FFRCAR抗噪性好;要提高噪声环境中的听觉认知需要保留声音的时域细节结构,提高人脑的FFRCAR;FFR可能起源于下丘。最后,本文研究了空间域、时间域及频率域中的听觉认知与感知。四个听觉行为学和听觉认知实验的结果表明:音乐训练能提高音高选择性注意力和识别率,其神经机制是音乐训练提高了人脑的时域细节结构响应;但音乐训练不能提高空间听觉;语言训练并不能显著提高听觉神经元的锁相能力;女性的时域细节结构神经元锁相强于男性;先天因素和后天训练都能提高噪声环境中的听觉认知。
Speech and auditory perception are important for human communication, learning and understanding. However, speech quality, auditory cognition and perception are always influenced by those noise and competing speech that are inevitable in real world. Speech enhancement algorithm is necessary in many acoustical applications. Nevertheless, most algorithms are unable to enhance objective speech quality and objective auditory perception simultaneously. It is necessary to propose new speech enhancement algorithms. Meanwhile, the auditory psychological and neural mechanisms in noisy environment are not well understood. These questions are focuses and challenges in many fields such as neural science and machine learning, and provide the research direction of speech enhancement algorithms.In this study, we proposed a speech enhancement algorithm based on symmetrical beamforming, FastICA, and sparse coding on cochlea filter banks. The algorithm takes advantages of the differences between speech and noise, including locations in spatial domain, statistical characteristics in time domain, and auditory coding in frequency domain. Four indices for evaluating objective and subjective performances of speech enhancement algorithm were introduced. Under various input SNRs, directions of arrival and noise types, results show that the objective quality, subjective performance and robustness of the proposed algorithm are all superior to those of main current algorithms.Based on the symmetrical beamforming, the second speech enhancement algorithm employing fractional delay filter, envelope modulation spectrum and adaptive filter was proposed. Most speech enhancement algorithms apply an unrealistic assumption that the speech propagates from the front while the noise comes from the back. The new algorithm is not restricted to it anymore. Results show that whether the desired speech originates from the front or not, speeches enhanced by this new algorithm achieve improved waveforms and spectrograms. The proposed algorithm presents low computational complexity, fast convergence, robustness to step and frequency, and practical application. Frequency following response (FFR) is an important auditory EEG signal. But exsisting FFR analyses are mostly affected by the noise floor. In our study, spectral magnitude SNR, phase-locking value (PLV), and significant response detection were proposed for FFR analysis. Two FFR experiments making use of different stimuli were carried out. Envelope-related FFR (FFRENV) and temporal-fine-structure-related FFR (TFS, FFRCAR) were calculated using those EEG data. We arrived at several conclusions: spectral magnitude SNR is capable of removing the noise floor, and presenting real FFR; PLV is sensitive to FFR and computationally efficient; FFRENV and FFRCAR are orthogonal; the frequency resolvability of sound contributes to the neuron phase locking to pitch; FFRENV at fundamental frequency and FFRCAR at other harmonics are robust to noise; auditory cognition in noisy environment can be improved by means of remaining TFS in sound and enhancing human brain responses to TFS; FFR may origins at hypothalamus.Moreover, we investigated human auditory perception and cognition in spatial, time and frequency domains. Experimental results of four auditory behavioral and cognitive experiments indicate that: music training is able to improve pitch attention and recognition, and neural basis is the corresponding human brain improvement of response to TFS; while music training does not contribute to auditory spatial perception; female’s neuron phasing locking to TFS of sound is stronger than male’s; language training cannot significantly enhance auditory neuron phase locking; auditory cognition improvement in noisy environment is attributed to both congenital factors and learned behavior.