I have some audio files, I want to plot the average spectrum of the audio files like "audacity" software using PYTHON (librosa library). I can see they are plotting average frequency vs amplitude plot of the entire audio.


After that, I want to apply CNN to classify two classes of samples. Looking for suggestions.

Thank you.

Usually you use librosa.display.specshow to plot spectrograms over time, not over the whole file. In fact, as input for your CNN you might rather use a spectrogram over time as produced by librosa.stft or some Mel spectrogram, depending on what your classification goal is.

Eg, if you want to classify for genre, a Mel-spectrogram may be most appropriate. If you want to find out key or chords, you'll need a Constant-Q-spectrogram (CQT), etc.

That said, here's some code that answers your question:

import librosa
import numpy as np
import matplotlib.pyplot as plt

file = YOUR_FILE
# load the file
y, sr = librosa.load(file, sr=44100)
# short time fourier transform
# (n_fft and hop length determine frequency/time resolution)
n_fft = 2048
S = librosa.stft(y, n_fft=n_fft, hop_length=n_fft//2)
# convert to db
# (for your CNN you might want to skip this and rather ensure zero mean and unit variance)
D = librosa.amplitude_to_db(np.abs(S), ref=np.max)
# average over file
D_AVG = np.mean(D, axis=1)

plt.bar(np.arange(D_AVG.shape[0]), D_AVG)
x_ticks_positions = [n for n in range(0, n_fft // 2, n_fft // 16)]
x_ticks_labels = [str(sr / 2048 * n) + 'Hz' for n in x_ticks_positions]
plt.xticks(x_ticks_positions, x_ticks_labels)

This leads to this output:


import matplotlib.pyplot as plt
from scipy import signal
from scipy.io import wavfile

sample_rate, samples = wavfile.read('h1.wav')
frequencies, times, spectrogram = signal.spectrogram(samples, sample_rate)

plt.pcolormesh(times, frequencies, spectrogram)

plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')

