简体   繁体   中英

How to read a MP3 audio file into a numpy array / save a numpy array to MP3?

Is there a way to read/write a MP3 audio file into/from a numpy array with a similar API to scipy.io.wavfile.read and scipy.io.wavfile.write :

sr, x = wavfile.read('test.wav')
wavfile.write('test2.wav', sr, x)

?

Note: pydub 's AudioSegment object doesn't give direct access to a numpy array.

PS: I have already read Importing sound files into Python as NumPy arrays (alternatives to audiolab) , tried all the answers, including those which requires to Popen ffmpeg and read the content from stdout pipe, etc. I have also read Trying to convert an mp3 file to a Numpy Array, and ffmpeg just hangs , etc., and tried the main answers, but there was no simple solution. After spending hours on this, I'm posting it here with "Answer your own question – share your knowledge, Q&A-style". I have also read How to create a numpy array from a pydub AudioSegment? but this does not easily cover the multi channel case, etc.

Calling ffmpeg and manually parsing its stdout as suggested in many posts about reading a MP3 is a tedious task (many corner cases because different number of channels are possible, etc.), so here is a working solution using pydub (you need to pip install pydub first).

This code allows to read a MP3 to a numpy array / write a numpy array to a MP3 file with a similar API than scipy.io.wavfile.read/write :

import pydub 
import numpy as np

def read(f, normalized=False):
    """MP3 to numpy array"""
    a = pydub.AudioSegment.from_mp3(f)
    y = np.array(a.get_array_of_samples())
    if a.channels == 2:
        y = y.reshape((-1, 2))
    if normalized:
        return a.frame_rate, np.float32(y) / 2**15
    else:
        return a.frame_rate, y

def write(f, sr, x, normalized=False):
    """numpy array to MP3"""
    channels = 2 if (x.ndim == 2 and x.shape[1] == 2) else 1
    if normalized:  # normalized array - each item should be a float in [-1, 1)
        y = np.int16(x * 2 ** 15)
    else:
        y = np.int16(x)
    song = pydub.AudioSegment(y.tobytes(), frame_rate=sr, sample_width=2, channels=channels)
    song.export(f, format="mp3", bitrate="320k")

Notes:

  • It only works for 16-bit files for now (even if 24-bit WAV files are pretty common, I've rarely seen 24-bit MP3 files... Does this exist?)
  • normalized=True allows to work with a float array (each item in [-1,1))

Usage example:

sr, x = read('test.mp3')
print(x)

#[[-225  707]
# [-234  782]
# [-205  755]
# ..., 
# [ 303   89]
# [ 337   69]
# [ 274   89]]

write('out2.mp3', sr, x)

You can use audio2numpy library. Install with

pip install audio2numpy

Then, your code would be:

import audio2numpy as a2n
x,sr=a2n.audio_from_file("test.mp3")

For writing, use @Basj 's answer

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM