繁体   English   中英

如何创建在MFCC中用于MATLAB中语音识别的三角形(Mel)滤镜库?

[英]How to create a Triangular (Mel) Filter Bank used in MFCC for speech recognition in MATLAB?

尽管可能有内置的功能可用,但我需要创建自己的三角滤波器组。 下面是我的代码。 我在HMatrix(filterbank)中得到NaN值。 这是由于创建矩阵时使用的FreqArray中的“相同”值所致。 我在以下问题上需要帮助:

  1. 知道我选择的44100Hz采样频率是否正确吗?
  2. 如何选择较低的频率= 300Hz和较高的频率= 8000Hz来计算梅尔滤波器组矩阵?
  3. 如何选择合适的帧大小(frame_length)和梅尔滤波器的数量(no_of_coeffs)?

function TriFilterBank()
tic
%-----------------------------INITIALISATION---------------------------%

fs=44100; %frequency at which I have sampled my recorded samples frame_length=256; %How to choose an appropriate frame-size? low_freq=300; %lower frequency for calculation of mel frequency filter bank (I'm unable to choose a correct one, and find the criteria for choosing it) high_freq=8000; %upper frequency for calculation of mel frequency filter bank (I'm unable to choose a correct one, and find the criteria for choosing it) % I have also tried with (fs/2)=22050Hz, but nno good results no_of_coeffs=20; % This is no. of Mel-Filter banks to create. how to choose a approriate value for this for speech processing applications? %--------------------------------------------------PRE-PROCESSING FOR MEL FILTER BANK CREATION-----------------------------------------------% low_linear=2595*log10(1+(low_freq/700)); high_linear=2595*log10(1+(high_freq/700)); band_length=(high_linear-low_linear)/(no_of_coeffs+1); MelArray(no_of_coeffs+2,1)=zeros(); %to store mel frequencies to calculate mel frequency filter bank LinearArray(no_of_coeffs+2,1)=zeros(); %to store linear frequencies to calculate mel frequency filter bank FreqArray(no_of_coeffs+2,1)=zeros(); %to store frequency array to calculate mel frequency filter bank %{ THIS ARRAY MAY HAVE WRONG VALUES DUE TO SELECTION of WRONG PARAMETERS LIKE low_freq, high_freq, frame_length (frame-size), no_of_coeffs (no. of filter banks). THIS IS MAJOR REASON BEHIND GENERATION OF NaN values in HMatrix %} HMatrix(no_of_coeffs,frame_length)=zeros(); %Hmk Matrix/ Filter Bank I'M VERY DOUBTFUL OF THE VALUES GENERATED BY THIS FILTER BANK MelArray(1)=low_linear; MelArray(no_of_coeffs+2)=high_linear; LinearArray(1)=low_freq; LinearArray(no_of_coeffs+2)=high_freq; FreqArray(1)=floor((int32(frame_length)+1)*LinearArray(1)/fs); FreqArray(no_of_coeffs+2)=floor((int32(frame_length)+1)*LinearArray(no_of_coeffs+2)/fs); for m=1:no_of_coeffs MelArray(m+1)=MelArray(m)+band_length; LinearArray(m+1)=700*((power( 10,MelArray(m+1)/2595))-1); FreqArray(m+1)=floor((int32(frame_length)+1)*LinearArray(m+1)/fs); %The values generated here seem to be doubtful, hence maybe an incorrect filter bank end % THE MOST DOUBTFUL/WRONG PART i.e. MEL FREQUENCY FILTER BANK MATRIX CREATION %---------------------------------------------------------PROBABLE ERRONEOUS PART------------------------------------------------------------% % I'M GETTING NaN values in this matrix probably due to choosing incorrect parameters for like upper freq, lower freq, frame-size, no.of filter banks, sampling frequency etc. % In FreqArray I'm getting two same values, hence it's satisfying none of the below conditions and generating a NaN value. for k=1:frame_length for m=1:no_of_coeffs if(k<FreqArray(m)) HMatrix(m,k)=0; elseif (FreqArray(m)<=k && k<=FreqArray(m+1)) HMatrix(m,k)=(k-FreqArray(m))/(FreqArray(m+1)-FreqArray(m)); elseif(FreqArray(m+1)<=k && k<=FreqArray(m+2)) HMatrix(m,k)=(FreqArray(m+2)-k)/(FreqArray(m+2)-FreqArray(m+1)); elseif (k>FreqArray(m+2)) HMatrix(m,k)=0; end end end %--------------------------------------------------------------------------------------------------------------------------------------------% save('TriFilterBank'); toc end

该代码基于以下等式:

梅尔滤波器组方程

上面的代码输出的主要部分如下所示,以供参考。

HMatrix(FilterBank)-图片1

HMatrix(FilterBank)-图片2

HMatrix(FilterBank)-图片3

线性数组

MelArray

频率阵列

作为参考,我使用了以下网站:

http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/

提前致谢!

知道我选择的44100Hz采样频率是否正确吗?

这个频率很好。 无论如何,语音驻留在16khz以下,因此16kHz是更常见的选择。 在您用作参考的博客文章中,它是16kHz。

如何选择较低的频率= 300Hz和较高的频率= 8000Hz来计算梅尔滤波器组矩阵?

此范围不是最佳范围,但对于大多数应用程序来说还可以。 为了获得高质量的声音,范围为20Hz至7600Hz。

如何选择合适的帧大小(frame_length)和梅尔滤波器的数量(no_of_coeffs)?

语音的帧大小通常约为25毫秒,这是在一帧内提供平稳性和正常速率语音分辨率的最佳值。 对于44100 kHz的采样率,它以帧中的大约1128(44100 * 0.025)个元素结束,而不是您选择的256个元素。 如果要具有2的幂,则一帧中需要2048个元素。 这也将是FFT阶数。

梅尔过滤器的数量可以是15-40,在许多系统中使用20是一个很好的值,发现它在实验中很有用。

最好阅读现有的实现,有许多您从本教程中学不到的东西,其中一个很好的是VoiceBox

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM