简体   繁体   中英

Reduce MFCC output

I am trying to analyze song audio using a python library, the output is a numpy array, the array is very large in size as the MFCC is calculated for every frame of the audio. When I write this output to a file , each song has an output of about 3-4MB. Is there a way to reduce the N frames of information into a single row of features?

单击此处]([![MFCC输出 )

A common practice is to group consecutive frames into sequence windows, compute aggregate statistics on each texture window and then summarize this again using aggregated statistics.

The statistics are computed per input feature (MFCC band in your case). Example statistics functions would be mean, standard deviation, min, max. Texture sizes can be between 1-60 seconds.

See Low-level features and timbre, Juan Pablo Bello, MPATE-GE 2623 Music Information Retrieval

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM