简体   繁体   English

如何使用python matplotlib库从WAV文件中提取数据?

[英]How to extract data from a wav file using python matplotlib library?

I'm trying to extract data from an wav file for audio analysis of each frequency and their amplitude with respect to time, my aim to run this data for a machine learning algorithm for a college project, after a bit of googling I found out that this can be done by Python's matplotlib library, I saw some sample codes that ran a Short Fourier transform and plotted a spectrogram of these wav files but wasn't able to understand how to use this library to extract data (all frequency's amplitude at a given time in the audio file) and store it in an 3D array or a .mat file. 我试图从wav文件中提取数据,以便对每个频率及其幅度相对于时间的音频进行分析,我的目标是为大学项目的机器学习算法运行此数据,经过一番谷歌搜索之后,我发现这可以通过Python的matplotlib库完成,我看到一些示例代码运行了短傅里叶变换,并绘制了这些wav文件的频谱图,但无法理解如何使用该库来提取数据(给定频率下所有频率的幅度时间在音频文件中)并将其存储在3D数组或.mat文件中。 Here's the code I saw on some website : 这是我在某些网站上看到的代码:

#!/usr/bin/env python

""" This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Frank Zalkow, 2012-2013 """

import numpy as np
from matplotlib import pyplot as plt
import scipy.io.wavfile as wav
from numpy.lib import stride_tricks

""" short time fourier transform of audio signal """
def stft(sig, frameSize, overlapFac=0.5, window=np.hanning):
    win = window(frameSize)
    hopSize = int(frameSize - np.floor(overlapFac * frameSize))

    # zeros at beginning (thus center of 1st window should be for sample nr. 0)
    samples = np.append(np.zeros(np.floor(frameSize/2.0)), sig)    
    # cols for windowing
    cols = np.ceil( (len(samples) - frameSize) / float(hopSize)) + 1
    # zeros at end (thus samples can be fully covered by frames)
    samples = np.append(samples, np.zeros(frameSize))

    frames = stride_tricks.as_strided(samples, shape=(cols, frameSize), strides=(samples.strides[0]*hopSize, samples.strides[0])).copy()
    frames *= win


    return np.fft.rfft(frames)    

""" scale frequency axis logarithmically """    
def logscale_spec(spec, sr=44100, factor=20.):
    timebins, freqbins = np.shape(spec)

    scale = np.linspace(0, 1, freqbins) ** factor
    scale *= (freqbins-1)/max(scale)
    scale = np.unique(np.round(scale))

    # create spectrogram with new freq bins
    newspec = np.complex128(np.zeros([timebins, len(scale)]))
    for i in range(0, len(scale)):
        if i == len(scale)-1:
            newspec[:,i] = np.sum(spec[:,scale[i]:], axis=1)
        else:        
            newspec[:,i] = np.sum(spec[:,scale[i]:scale[i+1]], axis=1)

    # list center freq of bins
    allfreqs = np.abs(np.fft.fftfreq(freqbins*2, 1./sr)[:freqbins+1])
    freqs = []
    for i in range(0, len(scale)):
        if i == len(scale)-1:
            freqs += [np.mean(allfreqs[scale[i]:])]
        else:
            freqs += [np.mean(allfreqs[scale[i]:scale[i+1]])]

    return newspec, freqs

""" plot spectrogram"""
def plotstft(audiopath, binsize=2**10, plotpath=None, colormap="jet"):
    samplerate, samples = wav.read(audiopath)
    s = stft(samples, binsize)

    sshow, freq = logscale_spec(s, factor=1.0, sr=samplerate)
    ims = 20.*np.log10(np.abs(sshow)/10e-6) # amplitude to decibel

    timebins, freqbins = np.shape(ims)

    plt.figure(figsize=(15, 7.5))
    plt.imshow(np.transpose(ims), origin="lower", aspect="auto", cmap=colormap, interpolation="none")
    plt.colorbar()

    plt.xlabel("time (s)")
    plt.ylabel("frequency (hz)")
    plt.xlim([0, timebins-1])
    plt.ylim([0, freqbins])

    xlocs = np.float32(np.linspace(0, timebins-1, 5))
    plt.xticks(xlocs, ["%.02f" % l for l in ((xlocs*len(samples)/timebins)+(0.5*binsize))/samplerate])
    ylocs = np.int16(np.round(np.linspace(0, freqbins-1, 10)))
    plt.yticks(ylocs, ["%.02f" % freq[i] for i in ylocs])

    if plotpath:
        plt.savefig(plotpath, bbox_inches="tight")
    else:
        plt.show()

    plt.clf()
plotstft("abc.wav")

Please guide me to understand how to extract the data, if not by matplotlib, recommend me some other library which will help me achieve this. 请指导我了解如何提取数据,如果不是通过matplotlib提取数据,请向我推荐其他一些可以帮助我实现这一目标的库。

First of all, this looks like my code which is stated to be under a CC license. 首先,这看起来像我的代码,该代码据称已获得CC许可。 I don't take it too serious, but you should not ignore those aspects (you omitted the statement of authorship in this case), others could be more miffed about such a thing. 我不太认真,但是您不应该忽略这些方面(在这种情况下,您省略了作者身份的声明),其他人可能会对这种事情感到更加不满。

To your question: In this code the stft isn't computed by matplotlib, but just by numpy. 问题:在这段代码中,stft不是由matplotlib计算的,而是由numpy计算的。 You can get it like this: 您可以这样获得:

samplerate, samples = wav.read(audiopath)
s = stft(samples, 1024)

I am not sure why you want a 3D array? 我不确定为什么要3D阵列? It is a 2D-array, but it is complex valued. 它是一个2D数组,但是值很复杂。 If you want to save it in a .mat file: 如果要将其保存在.mat文件中:

from scipy.io import savemat
savemat("file.mat", {'arr': s})

You can see once the wav audio file is read into variable samples it is passed to a function called stft : 您可以看到将wav音频文件读入变量样本后,将其传递到名为stft的函数:

samplerate, samples = wav.read(audiopath)
s = stft(samples, binsize)

here you already have access to the audio samples in var samples in the form of integers ... be aware that bit depth will impact number of bytes per sample as represented as a series of integers ... also know your endianness (left to right or visa versa) ... however in function stft that array is further processed into an array of floats in variable : frames before its passed into function np.fft.rfft 在这里您已经可以使用整数形式的var样本中的音频样本...请注意,位深会影响每个样本的字节数,以一系列整数表示...也知道您的字节序(从左到右)或反之亦然)...但是在函数stft中,该数组在传递到函数np.fft.rfft之前被进一步处理为float:变量数组中的数组

Depending on your needs those are your access choices without doing any of your own processing 根据您的需求,这些是您的访问选择,而无需您自己进行任何处理

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM