简体   繁体   English

为什么scipy和librosa在读取wav文件方面有所不同?

[英]Why are scipy and librosa different for reading wav file?

So I'm trying to get the samples from a wave file and I noticed that it's a different value depending on whether I use scipy or librosa. 因此,我尝试从wave文件中获取样本,并且我注意到根据我使用scipy还是librosa,它的值是不同的。

sampleFloats, fs = librosa.load('hi.wav', sr=48000)
print('{0:.15f}'.format(sampleFloats[len(sampleFloats)-1]))

from scipy.io.wavfile import read as wavread
# from python_speech_features import mfcc

[samplerate, x] = wavread('hi.wav') # x is a numpy array of integer, representing the samples 

# scale to -1.0 -- 1.0
if x.dtype == 'int16':
    nb_bits = 16 # -> 16-bit wav files
elif x.dtype == 'int32':
    nb_bits = 32 # -> 32-bit wav files
max_nb_bit = float(2 ** (nb_bits - 1))
samples = x / (max_nb_bit + 1.0) # samples is a numpy array of float representing the samples 

print(samples[len(samples)-1])

The print statements read: 打印语句为:

0.001251220703125
0.001274064182641886

The sample rate for the file is 48000. 该文件的采样率为48000。

Why might they be different? 为什么它们会有所不同? Is librosa using a different normalization? librosa是否使用其他规范化?

It's a type mismatch. 这是类型不匹配。 It is often useful to print not only the value, but also its type. 通常不仅打印值,而且打印其类型通常很有用。 In this case, because of the way the normalisation is done, type of samples values is float64 , while librosa returns float32 . 在这种情况下,由于完成标准化的方式, samples值的类型为float64 ,而librosa返回float32

This answer can help to figure out how to normalise (also, as pointed above, it is indeed max_nb_bit - 1 , not + ) 这个答案可以帮助弄清楚如何规范化(也如上所述,它的确是max_nb_bit - 1 ,而不是+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM