简体   繁体   English

在python中使用scipy和librosa读取WAV文件

[英]Reading a wav file with scipy and librosa in python

I am trying to load a .wav file in Python using the scipy folder. 我正在尝试使用scipy文件夹在Python中加载.wav文件。 My final objective is to create the spectrogram of that audio file. 我的最终目标是创建该音频文件的频谱图。 The code for reading the file could be summarized as follows: 读取文件的代码可以总结如下:

import scipy.io.wavfile as wav
(sig, rate) = wav.read(_wav_file_)

For some .wav files I am receiving the following error: 对于某些.wav文件,我收到以下错误:

WavFileWarning: Chunk (non-data) not understood, skipping it. WavFileWarning:不理解块(非数据),将其跳过。 WavFileWarning) ** ValueError: Incomplete wav chunk. WavFileWarning)** ValueError:不完整的wav块。

Therefore, I decided to use librosa for reading the files using the: 因此,我决定使用librosa通过以下方式读取文件:

import librosa
(sig, rate) = librosa.load(_wav_file_, sr=None)

That is working properly for all cases, however, I noticed a difference in the colors of the spectrogram. 在所有情况下都可以正常工作,但是,我注意到频谱图的颜色有所不同。 While it was the same exact figure, however, somehow the colors were inversed. 虽然它是相同的确切数字,但是颜色却是相反的。 More specifically, I noticed that when keeping the same function for calculation of the specs and changing only the way I am reading the .wav there was this difference. 更具体地说,我注意到,当保持相同的功能来计算规格并仅更改我阅读.wav的方式时,会有这种差异。 Any idea what can produce that thing? 知道会产生什么东西吗? Is there a default difference between the way the two approaches read the .wav file? 两种方法读取.wav文件的方式之间是否存在默认差异?

EDIT: 编辑:

(rate1, sig1) = wav.read(spec_file) # rate1 = 16000
sig, rate = librosa.load(spec_file) # rate 22050
sig = np.array(α*sig, dtype = "int16") 

Something that almost worked is to multiple the result of sig with a constant α alpha that was the scale between the max values of the signal from scipy wavread and the signal derived from librosa. 几乎工作的东西是将多个具有恒定sig的结果α的α,这是从SciPy的wavread和该信号的最大值从librosa得到的信号之间的比例。 Still though the signal rates were different. 尽管信号速率不同。

This sounds like a quantization problem. 这听起来像一个量化问题。 If samples in the wave file are stored as float and librosa is just performing a straight cast to an int , and value less than 1 will be truncated to 0. More than likely, this is why sig is an array of all zeros. 如果将wave文件中的样本存储为float而librosa只是将其直接转换为int ,则小于1的值将被截断为0。这很可能是sig是全零的数组的原因。 The float must be scaled to map it into range of an int . 必须缩放float才能将其映射到int范围内。 For example, 例如,

>>> a = sp.randn(10)
>>> a
array([-0.04250369,  0.244113  ,  0.64479281, -0.3665814 , -0.2836227 ,
       -0.27808428, -0.07668698, -1.3104602 ,  0.95253315, -0.56778205])

Convert a to type int without scaling 将a转换为int类型而不进行缩放

>>> a.astype(int)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Convert a to int with scaling for 16-bit integer 将比例转换为int并转换为16位整数

>>> b = (a* 32767).astype(int)
>>> b
array([ -1392,   7998,  21127, -12011,  -9293,  -9111,  -2512, -42939,
        31211, -18604])

Convert scaled int back to float 将换算的int转换回float

>>> c = b/32767.0
>>> c
array([-0.04248177,  0.24408704,  0.64476455, -0.36655782, -0.28360851,
       -0.27805414, -0.0766625 , -1.31043428,  0.9525132 , -0.56776635])

c and b are only equal to about 3 or 4 decimal places due to quantization to int . 由于量化为int cb仅等于约3或4个小数位。

If librosa is returning a float , you can scale it by 2**15 and cast it to an int to get same range of values that scipy wave reader is returning. 如果librosa返回的是float ,则可以将其缩放2**15并将其强制转换为int以获得与scipy Wave阅读器返回的值相同的范围。 Since librosa is returning a float , chances are the values going to lie within a much smaller range, such as [-1, +1] , than a 16-bit integer which will be in [-32768, +32767] . 由于librosa返回的是float ,因此这些值很有可能位于一个比[-32768, +32767]的16位整数小得多的范围内,例如[-1, +1] [-32768, +32767] So you need to scale one to get the ranges to match. 因此,您需要缩放一个以匹配范围。 For example, 例如,

sig, rate = librosa.load(spec_file, mono=True)
sig = sig × 32767
  • If you yourself do not want to do the quantization, then you could use pylab using the pylab.specgram function, to do it for you. 如果您自己不想进行量化,则可以通过pylab.specgram函数使用pylab来完成。 You can look inside the function and see how it uses vmin and vmax . 您可以查看该函数的内部,看看它如何使用vminvmax

  • It is not completely clear from your post (at least for me) what you want to achieve (as there is also neither a sample input file nor any script beforehand from you). 从您的帖子中(至少对我而言)尚不清楚要实现什么(因为事先也没有示例输入文件或脚本)。 But anyways, to check if the spectrogram of a wave file has significant differences depending on the case that the signal data returned from any of the read functions is float32 or int , I tested the following 3 functions. 但是无论如何,要检查wave文件的频谱图是否存在明显差异,这取决于从任何读取函数返回的信号数据是float32还是int ,我测试了以下3个函数。

Python Script: Python脚本:

_wav_file_ = "africa-toto.wav"

def spectogram_librosa(_wav_file_):
    import librosa
    import pylab
    import numpy as np

    (sig, rate) = librosa.load(_wav_file_, sr=None, mono=True,  dtype=np.float32)
    pylab.specgram(sig, Fs=rate)
    pylab.savefig('spectrogram3.png')

def graph_spectrogram_wave(wav_file):
    import wave
    import pylab
    def get_wav_info(wav_file):
        wav = wave.open(wav_file, 'r')
        frames = wav.readframes(-1)
        sound_info = pylab.fromstring(frames, 'int16')
        frame_rate = wav.getframerate()
        wav.close()
        return sound_info, frame_rate
    sound_info, frame_rate = get_wav_info(wav_file)
    pylab.figure(num=3, figsize=(10, 6))
    pylab.title('spectrogram pylab with wav_file')
    pylab.specgram(sound_info, Fs=frame_rate)
    pylab.savefig('spectrogram2.png')


def graph_wavfileread(_wav_file_):
    import matplotlib.pyplot as plt
    from scipy import signal
    from scipy.io import wavfile
    import numpy as np   
    sample_rate, samples = wavfile.read(_wav_file_)   
    frequencies, times, spectrogram = signal.spectrogram(samples,sample_rate,nfft=1024)
    plt.pcolormesh(times, frequencies, 10*np.log10(spectrogram))
    plt.ylabel('Frequency [Hz]')
    plt.xlabel('Time [sec]')
    plt.savefig("spectogram1.png")


spectogram_librosa(_wav_file_)
#graph_wavfileread(_wav_file_)
#graph_spectrogram_wave(_wav_file_)
  • which produced the following 3 outputs: 产生了以下三个输出:

在此处输入图片说明

在此处输入图片说明

在此处输入图片说明

which apart from the minor differences in size and intensity seem quite similar, no matter the read method, library or data type, which makes me question a little, for what purpose need the outputs be 'exactly' same and how exact should they be. 除了大小和强度上的微小差异外,无论是读取方法,库还是数据类型,它们看起来都非常相似,这使我有些疑问,输出的目的是“完全”相同是什么,输出应该是多么精确。

  • I do find strange though that the librosa.load() function offers a dtype parameter but works anyways only with float values. 我确实发现奇怪,尽管librosa.load()函数提供了dtype参数,但无论如何仅适用于float值。 Googling in this regard led to me to only this issue which wasn't much help and this issue says that that's how it will stay with librosa, as internally it seems to only use floats. 在这方面进行谷歌搜索仅导致我遇到的问题并没有太大帮助,该问题表明这就是librosa的问题,因为在内部它似乎仅使用浮点数。

To add on to what has been said, Librosa has a utility to convert integer arrays to floats. 补充说一下,Librosa有一个实用程序可以将整数数组转换为浮点数。

float_audio = librosa.util.buf_to_float(sig)

I use this to great success when producing spectrograms of Pydub audiosegments. 在制作Pydub音频片段的声谱图时,我用它取得了巨大的成功。 Keep in mind, one of its arguments is the number of bytes per sample. 请记住,其参数之一是每个样本的字节数。 It defaults to 2. You can read about it more in the documentation here . 默认为2。您可以在此处文档中了解更多信息 Here is the source code : 这是源代码

def buf_to_float(x, n_bytes=2, dtype=np.float32):
    """Convert an integer buffer to floating point values.
    This is primarily useful when loading integer-valued wav data
    into numpy arrays.
    See Also
    --------
    buf_to_float
    Parameters
    ----------
    x : np.ndarray [dtype=int]
        The integer-valued data buffer
    n_bytes : int [1, 2, 4]
        The number of bytes per sample in `x`
    dtype : numeric type
        The target output type (default: 32-bit float)
    Returns
    -------
    x_float : np.ndarray [dtype=float]
        The input data buffer cast to floating point
    """

    # Invert the scale of the data
    scale = 1./float(1 << ((8 * n_bytes) - 1))

    # Construct the format string
    fmt = '<i{:d}'.format(n_bytes)

    # Rescale and format the data buffer
    return scale * np.frombuffer(x, fmt).astype(dtype)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM