简体   繁体   中英

What is returning after executing command to extract mfcc?

I have been learning Sound Analysis and i have encountered term mfcc in it. So when i execute librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40) I get numpy array of shape 40 by 216. So i get that I have extracted 40 features over 216 frames. But here what exactly is meant by frames, is it similar to sample rate and where we define it while loading the audio file.

What's not immediately obvious from the mfcc docs is that it calls librosa.feature.melspectrogram internally. And melspectrogram has the parameters win_length / n_fft and hop_length , which define a frame . You can also pass these parameters to mfcc .

So what's a frame? Basically, it's the result of processing a bunch of raw samples: Assuming a window length of 2048 samples (that's the default) and a hop length of 512 (also the default) each one of the frames returned by mfcc corresponds to 2048 raw samples and is 512 samples "further along in the audio" than its predecessor. In other words, there is a significant overlap between frames.

As an example, to create mfcc's for your audio, defining a frame as 1024 samples and a hop length of 512, you could call:

librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40, hop_length=512, n_fft=1024)

Again, if you are not explicitly passing those arguments, defaults from melspectrogram are used.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM