简体   繁体   中英

Plotting audio spectrogram in python

I currently have a few thousand audio clips that I need to classify with machine learning.

After some digging I found that if you do a short time fourier transform on the audio, it turns into a 2 dimensional image so I can use various image classification algorithms on these images instead of the audio files themselves.

To this end I found a python package that does the STFT and all I need is to plot it so I can get the images. For plotting I found this github repo very useful.

Finally my code ended up as this:

import stft    
import scipy
import scipy.io.wavfile as wav
import matplotlib.pylab as pylab

def save_stft_image(source_filename, destination_filename):
    fs, audio = wav.read(source_filename)
    X = stft.spectrogram(audio)

    print X.shape    

    fig = pylab.figure()    
    ax = pylab.Axes(fig, [0,0,1,1])    
    ax.set_axis_off()
    fig.add_axes(ax)      
    pylab.imshow(scipy.absolute(X[:][:][0].T), origin='lower', aspect='auto', interpolation='nearest')
    pylab.savefig(destination_filename)

save_stft_image("Example.wav","Example.png")

And output is: 在此处输入图片说明

The code works, however I observed that when print X.shape line executes I get (513L, 943L, 2L) . So the result is 3 dimensional. So when I only write X[:][:][0] or X[:][:][1] I get an image.

I keep reading this "redundancy" STFT has, that you can remove the half because you would not need it. Is that 3rd dimension that redundancy or am I doing something very wrong here? If so how do I properly plot it?

Thank you.

Edit: So the new code and output is:

import stft
import os
import scipy
import scipy.io.wavfile as wav
import matplotlib.pylab as pylab

def save_stft_image(source_filename, destination_filename):
    fs, audio = wav.read(source_filename)
    audio = scipy.mean(audio, axis = 1)
    X = stft.spectrogram(audio)

    print X.shape    

    fig = pylab.figure()    
    ax = pylab.Axes(fig, [0,0,1,1])    
    ax.set_axis_off()
    fig.add_axes(ax)      
    pylab.imshow(scipy.absolute(X.T), origin='lower', aspect='auto', interpolation='nearest')
    pylab.savefig(destination_filename)

save_stft_image("Example.wav","Example.png")

在此处输入图片说明

On the left I get an almost invisible column of colors. The sounds I am working on are respiratory sounds, so they have very low frequencies. Maybe that's why the visualization is a very thin column of colors.

You probably have an stereo audio file? So X[:][:][0] and X[:][:][1] correspond to each channel.

You can convert multichannel to mono by scipy.mean(audio, axis=1) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM