[英]Reading a wav file with scipy and librosa in python
I am trying to load a .wav
file in Python using the scipy folder. 我正在尝试使用scipy文件夹在Python中加载
.wav
文件。 My final objective is to create the spectrogram of that audio file. 我的最终目标是创建该音频文件的频谱图。 The code for reading the file could be summarized as follows:
读取文件的代码可以总结如下:
import scipy.io.wavfile as wav
(sig, rate) = wav.read(_wav_file_)
For some .wav
files I am receiving the following error: 对于某些
.wav
文件,我收到以下错误:
WavFileWarning: Chunk (non-data) not understood, skipping it.
WavFileWarning:不理解块(非数据),将其跳过。 WavFileWarning) ** ValueError: Incomplete wav chunk.
WavFileWarning)** ValueError:不完整的wav块。
Therefore, I decided to use librosa for reading the files using the: 因此,我决定使用librosa通过以下方式读取文件:
import librosa
(sig, rate) = librosa.load(_wav_file_, sr=None)
That is working properly for all cases, however, I noticed a difference in the colors of the spectrogram. 在所有情况下都可以正常工作,但是,我注意到频谱图的颜色有所不同。 While it was the same exact figure, however, somehow the colors were inversed.
虽然它是相同的确切数字,但是颜色却是相反的。 More specifically, I noticed that when keeping the same function for calculation of the specs and changing only the way I am reading the
.wav
there was this difference. 更具体地说,我注意到,当保持相同的功能来计算规格并仅更改我阅读
.wav
的方式时,会有这种差异。 Any idea what can produce that thing? 知道会产生什么东西吗? Is there a default difference between the way the two approaches read the
.wav
file? 两种方法读取
.wav
文件的方式之间是否存在默认差异?
EDIT: 编辑:
(rate1, sig1) = wav.read(spec_file) # rate1 = 16000
sig, rate = librosa.load(spec_file) # rate 22050
sig = np.array(α*sig, dtype = "int16")
Something that almost worked is to multiple the result of sig with a constant α
alpha that was the scale between the max values of the signal from scipy wavread and the signal derived from librosa. 几乎工作的东西是将多个具有恒定sig的结果
α
的α,这是从SciPy的wavread和该信号的最大值从librosa得到的信号之间的比例。 Still though the signal rates were different. 尽管信号速率不同。
This sounds like a quantization problem. 这听起来像一个量化问题。 If samples in the wave file are stored as
float
and librosa is just performing a straight cast to an int
, and value less than 1 will be truncated to 0. More than likely, this is why sig
is an array of all zeros. 如果将wave文件中的样本存储为
float
而librosa只是将其直接转换为int
,则小于1的值将被截断为0。这很可能是sig
是全零的数组的原因。 The float
must be scaled to map it into range of an int
. 必须缩放
float
才能将其映射到int
范围内。 For example, 例如,
>>> a = sp.randn(10)
>>> a
array([-0.04250369, 0.244113 , 0.64479281, -0.3665814 , -0.2836227 ,
-0.27808428, -0.07668698, -1.3104602 , 0.95253315, -0.56778205])
Convert a to type int
without scaling 将a转换为
int
类型而不进行缩放
>>> a.astype(int)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Convert a to int
with scaling for 16-bit integer 将比例转换为
int
并转换为16位整数
>>> b = (a* 32767).astype(int)
>>> b
array([ -1392, 7998, 21127, -12011, -9293, -9111, -2512, -42939,
31211, -18604])
Convert scaled int
back to float
将换算的
int
转换回float
>>> c = b/32767.0
>>> c
array([-0.04248177, 0.24408704, 0.64476455, -0.36655782, -0.28360851,
-0.27805414, -0.0766625 , -1.31043428, 0.9525132 , -0.56776635])
c
and b
are only equal to about 3 or 4 decimal places due to quantization to int
. 由于量化为
int
c
和b
仅等于约3或4个小数位。
If librosa is returning a float
, you can scale it by 2**15
and cast it to an int
to get same range of values that scipy wave reader is returning. 如果librosa返回的是
float
,则可以将其缩放2**15
并将其强制转换为int
以获得与scipy Wave阅读器返回的值相同的范围。 Since librosa is returning a float
, chances are the values going to lie within a much smaller range, such as [-1, +1]
, than a 16-bit integer which will be in [-32768, +32767]
. 由于librosa返回的是
float
,因此这些值很有可能位于一个比[-32768, +32767]
的16位整数小得多的范围内,例如[-1, +1]
[-32768, +32767]
。 So you need to scale one to get the ranges to match. 因此,您需要缩放一个以匹配范围。 For example,
例如,
sig, rate = librosa.load(spec_file, mono=True)
sig = sig × 32767
If you yourself do not want to do the quantization, then you could use pylab
using the pylab.specgram
function, to do it for you. 如果您自己不想进行量化,则可以通过
pylab.specgram
函数使用pylab
来完成。 You can look inside the function and see how it uses vmin
and vmax
. 您可以查看该函数的内部,看看它如何使用
vmin
和vmax
。
It is not completely clear from your post (at least for me) what you want to achieve (as there is also neither a sample input file nor any script beforehand from you). 从您的帖子中(至少对我而言)尚不清楚要实现什么(因为事先也没有示例输入文件或脚本)。 But anyways, to check if the spectrogram of a wave file has significant differences depending on the case that the signal data returned from any of the read functions is
float32
or int
, I tested the following 3 functions. 但是无论如何,要检查wave文件的频谱图是否存在明显差异,这取决于从任何读取函数返回的信号数据是
float32
还是int
,我测试了以下3个函数。
_wav_file_ = "africa-toto.wav"
def spectogram_librosa(_wav_file_):
import librosa
import pylab
import numpy as np
(sig, rate) = librosa.load(_wav_file_, sr=None, mono=True, dtype=np.float32)
pylab.specgram(sig, Fs=rate)
pylab.savefig('spectrogram3.png')
def graph_spectrogram_wave(wav_file):
import wave
import pylab
def get_wav_info(wav_file):
wav = wave.open(wav_file, 'r')
frames = wav.readframes(-1)
sound_info = pylab.fromstring(frames, 'int16')
frame_rate = wav.getframerate()
wav.close()
return sound_info, frame_rate
sound_info, frame_rate = get_wav_info(wav_file)
pylab.figure(num=3, figsize=(10, 6))
pylab.title('spectrogram pylab with wav_file')
pylab.specgram(sound_info, Fs=frame_rate)
pylab.savefig('spectrogram2.png')
def graph_wavfileread(_wav_file_):
import matplotlib.pyplot as plt
from scipy import signal
from scipy.io import wavfile
import numpy as np
sample_rate, samples = wavfile.read(_wav_file_)
frequencies, times, spectrogram = signal.spectrogram(samples,sample_rate,nfft=1024)
plt.pcolormesh(times, frequencies, 10*np.log10(spectrogram))
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.savefig("spectogram1.png")
spectogram_librosa(_wav_file_)
#graph_wavfileread(_wav_file_)
#graph_spectrogram_wave(_wav_file_)
which apart from the minor differences in size and intensity seem quite similar, no matter the read method, library or data type, which makes me question a little, for what purpose need the outputs be 'exactly' same and how exact should they be. 除了大小和强度上的微小差异外,无论是读取方法,库还是数据类型,它们看起来都非常相似,这使我有些疑问,输出的目的是“完全”相同是什么,输出应该是多么精确。
librosa.load()
function offers a dtype
parameter but works anyways only with float
values. librosa.load()
函数提供了dtype
参数,但无论如何仅适用于float
值。 Googling in this regard led to me to only this issue which wasn't much help and this issue says that that's how it will stay with librosa, as internally it seems to only use floats. To add on to what has been said, Librosa has a utility to convert integer arrays to floats. 补充说一下,Librosa有一个实用程序可以将整数数组转换为浮点数。
float_audio = librosa.util.buf_to_float(sig)
I use this to great success when producing spectrograms of Pydub audiosegments. 在制作Pydub音频片段的声谱图时,我用它取得了巨大的成功。 Keep in mind, one of its arguments is the number of bytes per sample.
请记住,其参数之一是每个样本的字节数。 It defaults to 2. You can read about it more in the documentation here .
默认为2。您可以在此处的文档中了解更多信息 。 Here is the source code :
这是源代码 :
def buf_to_float(x, n_bytes=2, dtype=np.float32):
"""Convert an integer buffer to floating point values.
This is primarily useful when loading integer-valued wav data
into numpy arrays.
See Also
--------
buf_to_float
Parameters
----------
x : np.ndarray [dtype=int]
The integer-valued data buffer
n_bytes : int [1, 2, 4]
The number of bytes per sample in `x`
dtype : numeric type
The target output type (default: 32-bit float)
Returns
-------
x_float : np.ndarray [dtype=float]
The input data buffer cast to floating point
"""
# Invert the scale of the data
scale = 1./float(1 << ((8 * n_bytes) - 1))
# Construct the format string
fmt = '<i{:d}'.format(n_bytes)
# Rescale and format the data buffer
return scale * np.frombuffer(x, fmt).astype(dtype)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.