语音唤醒,又称语音活动检测,作为语音处理技术的重要一环,广泛应用于通信、智能语音等领域。语音唤醒技术是影响后续系统整体性能的关键技术。同时在复杂的语音处理系统中,语音唤醒模块通常用来控制其他功能模块的电源选通,这样可以避免系统一直处于高功耗常开状态。因此低功耗语音唤醒系统对于降低语音处理整体系统的能耗至关重要。 本文首先介绍了语音唤醒系统的工作原理及典型架构。随后,论文提出了一种新的语音唤醒算法并基于该算法设计了相应的语音唤醒芯片。 在语音唤醒算法设计中,本论文采用时域子带能量作为主要的语音分类特征,通过减少子带数目的方式,降低算法计算复杂度;并且论文提出新的特征组合,增加了递归平均短时能量特征以校正算法在语音过渡区的误判,提高准确率。同时该算法基于线性支持向量机实现分类功能,并限制权重精度以进一步降低计算量。在TIMIT数据集、NOISE92和MUSAN噪声库组成的测试数据下进行算法性能测试。结果显示,在单一噪声且信噪比为10dB时,该算法的语音分类准确率高于93%;在混合噪声且信噪比为10dB时,该算法实现了90.4%的语音分类准确率。 在语音唤醒芯片设计中,本论文主要针对低功耗目标进行电路设计。首先采用模拟电路进行特征提取,以解决模数转换器占据传统设计中大部分功耗的问题。之后采用包络提取电路代替积分器完成语音能量计算,从而节省了运算放大器模块的功耗。同时本文利用亚阈值导电特点,将部分电路工作在亚阈值,从而进一步降低电路功耗。分类器电路采用低工作电压设计以降低系统功耗。该设计已经基于0.18μm CMOS工艺完成电路系统设计、后仿验证和流片。芯片核心电路面积为1.68mm2。后仿真结果显示,在工作电压为0.8V的情况下,该语音唤醒芯片的电路功耗为752nW。使用带有混合背景噪声的10 dB信噪比语音数据进行分类测试,结果显示,语音准确率可达90%。
Voice Activity Detection is widely used in communication, intelligent voice and other fields. Voice Activity Detection is a key technology that affects the performance of other voice processing modules in the system. In a complex voice processing system, the Voice Activity Detection module is usually used to control the power gating of other functional modules, so that the system will not be in a high-power normally-on state all the time. Therefore, a low-power voice activation detection system is very important to reduce the overall system energy consumption. This paper introduces the working principle and typical architecture of Voice Activity Detection system. Subsequently, this paper proposes an algorithm based on new feature combination and designs a corresponding voice activity detector based on the algorithm. In the Voice Activity Detection algorithm of this work, the time-domain subband energy is used as the speech classification feature, and the computational complexity of the algorithm is reduced by reducing the number of subbands. At the same time, we propose a new feature combination, adding recursive average short-term energy features to correct the misjudgment of the algorithm in the speech transition area and improve the accuracy. The algorithm realizes the classification function based on the linear support vector machine, and limits the weight accuracy to further reduce the calculation amount. The algorithm performance is tested under the test data composed of TIMIT dataset, NOISE92 and MUSAN noise library. The speech classification accuracy is higher than 93% at 10 dB signal-to-noise ratio with single noise, and the accuracy is 90.4% at 10 dB signal-to-noise ratio with mixed noise. In the Voice Activity Detectior design of this work, the circuit design is mainly aimed at the low power consumption target. First, analog feature extraction circuits are used to solve the problem that the ADC occupies most of the power consumption in the traditional design. Then the envelope extraction circuit is used instead of the integrator to complete the energy calculation, which reduces the number of operational amplifiers. At the same time, some circuits are designed to work at sub-threshold to reduce the working voltage. The classifier circuit is also designed with low operating voltage to reduce system power consumption. This work has completed circuit system design and tape-out verification based on 0.18μm CMOS process. The core circuit area of the chip is 1.68mm2. The post-simulation results show the circuit power consumption of the Voice Activity Detectior is 752nW with 0.8V power supply voltage. This Voice Activity Detectior achieves 90% speech classification accuracy at 10 dB signal-to-noise ratio with mixed noise.