简体   繁体   English

读取 *.wav 文件 Python

[英]Reading *.wav files in Python

I need to analyze sound written in a.wav file.我需要分析写在 a.wav 文件中的声音。 For that I need to transform this file into set of numbers (arrays, for example).为此,我需要将此文件转换为一组数字(例如数组)。 I think I need to use the wave package. However, I do not know how exactly it works.我想我需要使用 wave package。但是,我不知道它是如何工作的。 For example I did the following:例如我做了以下事情:

import wave
w = wave.open('/usr/share/sounds/ekiga/voicemail.wav', 'r')
for i in range(w.getnframes()):
    frame = w.readframes(i)
    print frame

As a result of this code I expected to see sound pressure as function of time.作为此代码的结果,我预计声压为 function 的时间。 In contrast I see a lot of strange, mysterious symbols (which are not hexadecimal numbers).相比之下,我看到了很多奇怪的、神秘的符号(不是十六进制数)。 Can anybody, pleas, help me with that?任何人都可以帮我解决这个问题吗?

Per the documentation , scipy.io.wavfile.read(somefile) returns a tuple of two items: the first is the sampling rate in samples per second, the second is a numpy array with all the data read from the file:根据文档scipy.io.wavfile.read(somefile)返回一个包含两个项目的元组:第一个是每秒样本的采样率,第二个是一个numpy数组,其中包含从文件中读取的所有数据:

from scipy.io import wavfile
samplerate, data = wavfile.read('./output/audio.wav')

Using the struct module , you can take the wave frames (which are in 2's complementary binary between -32768 and 32767 (ie 0x8000 and 0x7FFF ). This reads a MONO, 16-BIT, WAVE file. I found this webpage quite useful in formulating this:使用struct模块,您可以获取波形帧(它们在 -32768 和 32767 之间的2 的互补二进制中(即0x80000x7FFF )。这会读取一个单声道、16 位、WAVE 文件。我发现这个网页在制定公式时非常有用这个:

import wave, struct

wavefile = wave.open('sine.wav', 'r')

length = wavefile.getnframes()
for i in range(0, length):
    wavedata = wavefile.readframes(1)
    data = struct.unpack("<h", wavedata)
    print(int(data[0]))

This snippet reads 1 frame.此代码段读取 1 帧。 To read more than one frame (eg, 13), use要读取多于一帧(例如,13),请使用

wavedata = wavefile.readframes(13)
data = struct.unpack("<13h", wavedata)

Different Python modules to read wav:读取 wav 的不同 Python 模块:

There is at least these following libraries to read wave audio files:至少有以下这些库可以读取波形音频文件:

The most simple example:最简单的例子:

This is a simple example with SoundFile:这是 SoundFile 的一个简单示例:

import soundfile as sf
data, samplerate = sf.read('existing_file.wav') 

Format of the output:输出格式:

Warning, the data are not always in the same format, that depends on the library.警告,数据并不总是采用相同的格式,这取决于库。 For instance:例如:

from scikits import audiolab
from scipy.io import wavfile
from sys import argv
for filepath in argv[1:]:
    x, fs, nb_bits = audiolab.wavread(filepath)
    print('Reading with scikits.audiolab.wavread:', x)
    fs, x = wavfile.read(filepath)
    print('Reading with scipy.io.wavfile.read:', x)

Output:输出:

Reading with scikits.audiolab.wavread: [ 0.          0.          0.         ..., -0.00097656 -0.00079346 -0.00097656]
Reading with scipy.io.wavfile.read: [  0   0   0 ..., -32 -26 -32]

SoundFile and Audiolab return floats between -1 and 1 (as matab does, that is the convention for audio signals). SoundFile 和 Audiolab 返回值在 -1 和 1 之间浮动(就像 matab 一样,这是音频信号的约定)。 Scipy and wave return integers, which you can convert to floats according to the number of bits of encoding, for example: Scipy 和 wave 返回整数,您可以根据编码的位数将其转换为浮点数,例如:

from scipy.io.wavfile import read as wavread
samplerate, x = wavread(audiofilename)  # x is a numpy array of integers, representing the samples 
# scale to -1.0 -- 1.0
if x.dtype == 'int16':
    nb_bits = 16  # -> 16-bit wav files
elif x.dtype == 'int32':
    nb_bits = 32  # -> 32-bit wav files
max_nb_bit = float(2 ** (nb_bits - 1))
samples = x / (max_nb_bit + 1)  # samples is a numpy array of floats representing the samples 

IMHO, the easiest way to get audio data from a sound file into a NumPy array is SoundFile :恕我直言,从声音文件中获取音频数据到 NumPy 数组的最简单方法是SoundFile

import soundfile as sf
data, fs = sf.read('/usr/share/sounds/ekiga/voicemail.wav')

This also supports 24-bit files out of the box.这也支持开箱即用的 24 位文件。

There are many sound file libraries available, I've written an overview where you can see a few pros and cons.有许多可用的声音文件库,我写了一个概述,您可以在其中看到一些优点和缺点。 It also features a page explaining how to read a 24-bit wav file with the wave module .它还包含一个页面,解释如何使用wave模块读取 24 位 wav 文件

You can accomplish this using the scikits.audiolab module.您可以使用scikits.audiolab模块完成此操作。 It requires NumPy and SciPy to function, and also libsndfile.它需要 NumPy 和 SciPy 才能运行,还需要 libsndfile。

Note, I was only able to get it to work on Ubunutu and not on OSX.请注意,我只能让它在 Ubunutu 上运行,而不能在 OSX 上运行。

from scikits.audiolab import wavread

filename = "testfile.wav"

data, sample_frequency,encoding = wavread(filename)

Now you have the wav data现在你有了 wav 数据

If you want to procces an audio block by block, some of the given solutions are quite awful in the sense that they imply loading the whole audio into memory producing many cache misses and slowing down your program.如果您想逐块处理音频,某些给定的解决方案在某种意义上非常糟糕,它们意味着将整个音频加载到内存中,从而产生许多缓存未命中并减慢程序速度。 python-wavefile provides some pythonic constructs to do NumPy block-by-block processing using efficient and transparent block management by means of generators. python-wavefile提供了一些pythonic结构,通过生成器使用高效透明的块管理来进行 NumPy 逐块处理。 Other pythonic niceties are context manager for files, metadata as properties... and if you want the whole file interface, because you are developing a quick prototype and you don't care about efficency, the whole file interface is still there.其他pythonic的优点是文件的上下文管理器,元数据作为属性......如果你想要整个文件界面,因为你正在开发一个快速原型并且你不关心效率,整个文件界面仍然存在。

A simple example of processing would be:一个简单的处理示例是:

import sys
from wavefile import WaveReader, WaveWriter

with WaveReader(sys.argv[1]) as r :
    with WaveWriter(
            'output.wav',
            channels=r.channels,
            samplerate=r.samplerate,
            ) as w :

        # Just to set the metadata
        w.metadata.title = r.metadata.title + " II"
        w.metadata.artist = r.metadata.artist

        # This is the prodessing loop
        for data in r.read_iter(size=512) :
            data[1] *= .8     # lower volume on the second channel
            w.write(data)

The example reuses the same block to read the whole file, even in the case of the last block that usually is less than the required size.该示例重用同一个块来读取整个文件,即使最后一个块通常小于所需大小也是如此。 In this case you get an slice of the block.在这种情况下,您将获得块的一部分。 So trust the returned block length instead of using a hardcoded 512 size for any further processing.因此,请相信返回的块长度,而不是使用硬编码的 512 大小进行任何进一步处理。

My dear, as far as I understood what you are looking for, you are getting into a theory field called Digital Signal Processing (DSP).亲爱的,据我了解你在找什么,你正在进入一个叫做数字信号处理(DSP)的理论领域。 This engineering area comes from a simple analysis of discrete-time signals to complex adaptive filters.该工程领域来自对离散时间信号到复杂自适应滤波器的简单分析。 A nice idea is to think of the discrete-time signals as a vector, where each element of this vector is a sampled value of the original, continuous-time signal.一个不错的想法是将离散时间信号视为一个向量,其中该向量的每个元素都是原始连续时间信号的采样值。 Once you get the samples in a vector form, you can apply different digital signal techniques to this vector.一旦获得矢量形式的样本,就可以将不同的数字信号技术应用于该矢量。

Unfortunately, on Python, moving from audio files to NumPy array vector is rather cumbersome, as you could notice... If you don't idolize one programming language over other, I highly suggest trying out MatLab/Octave.不幸的是,在 Python 上,从音频文件移动到 NumPy 数组向量相当麻烦,您可能会注意到......如果您不崇拜一种编程语言,我强烈建议尝试使用 MatLab/Octave。 Matlab makes the samples access from files straightforward. Matlab 使从文件访问示例变得简单。 audioread() makes this task to you:) And there are a lot of toolboxes designed specifically for DSP. audioread()为您完成此任务:) 并且有许多专门为 DSP 设计的工具箱。

Nevertheless, if you really intend to get into Python for this, I'll give you a step-by-step to guide you.不过,如果您真的打算为此进入 Python,我将逐步指导您。


1. Get the samples 1.获取样品

The easiest way the get the samples from the .wav file is:.wav文件中获取样本的最简单方法是:

from scipy.io import wavfile

sampling_rate, samples = wavfile.read(f'/path/to/file.wav')


Alternatively, you could use the wave and struct package to get the samples:或者,您可以使用wavestruct package 来获取示例:

import numpy as np
import wave, struct

wav_file = wave.open(f'/path/to/file.wav', 'rb')
# from .wav file to binary data in hexadecimal
binary_data = wav_file.readframes(wav_file.getnframes())
# from binary file to samples
s = np.array(struct.unpack('{n}h'.format(n=wav_file.getnframes()*wav_file.getnchannels()), binary_data))

Answering your question: binary_data is a bytes object, which is not human-readable and can only make sense to a machine.回答你的问题: binary_data是一个bytes object,它不是人类可读的,只能对机器有意义。 You can validate this statement typing type(binary_data) .您可以通过键入type(binary_data)来验证此语句。 If you really want to understand a little bit more about this bunch of odd characters, click here .如果你真的想对这群奇怪的角色有更多的了解,请点击这里

If your audio is stereo (that is, has 2 channels), you can reshape this signal to achieve the same format obtained with scipy.io如果您的音频是立体声(即有 2 个通道),您可以重塑此信号以获得与scipy.io相同的格式

s_like_scipy = s.reshape(-1, wav_file.getnchannels())

Each column is a chanell.每列是一个香奈儿。 In either way, the samples obtained from the .wav file can be used to plot and understand the temporal behavior of the signal.无论哪种方式,从.wav文件获得的样本都可以用于 plot 并了解信号的时间行为。

In both alternatives, the samples obtained from the files are represented in the Linear Pulse Code Modulation (LPCM)在这两种选择中,从文件中获得的样本都以线性脉冲编码调制 (LPCM)表示


2. Do digital signal processing stuffs onto the audio samples 2. 对音频样本进行数字信号处理

I'll leave that part up to you:) But this is a nice book to take you through DSP.我会把那部分留给你:)但这 是一本带你了解 DSP 的好书 Unfortunately, I don't know good books with Python, they are usually horrible books... But do not worry about it, the theory can be applied in the very same way using any programming language, as long as you domain that language.不幸的是,我不知道 Python 的好书,它们通常是可怕的书......但不要担心,只要您掌握该语言,就可以使用任何编程语言以相同的方式应用该理论。

Whatever the book you pick up, stick with the classical authors, such as Proakis, Oppenheim, and so on... Do not care about the language programming they use.无论你拿起什么书,坚持经典作者,如 Proakis、Oppenheim 等……不要关心他们使用的语言编程。 For a more practical guide of DPS for audio using Python, see this page.有关使用 Python 的音频 DPS 的更实用指南, 请参阅此页面。

3. Play the filtered audio samples 3.播放过滤后的音频样本

import pyaudio

p = pyaudio.PyAudio()
stream = p.open(format = p.get_format_from_width(wav_file.getsampwidth()),
                channels = wav_file.getnchannels(),
                rate = wav_file.getframerate(),
                output = True)
# from samples to the new binary file
new_binary_data = struct.pack('{}h'.format(len(s)), *s)
stream.write(new_binary_data)

where wav_file.getsampwidth() is the number of bytes per sample, and wav_file.getframerate() is the sampling rate.其中wav_file.getsampwidth()是每个样本的字节数, wav_file.getframerate()是采样率。 Just use the same parameters of the input audio.只需使用与输入音频相同的参数。


4. Save the result in a new .wav file 4.将结果保存在一个新的.wav文件中

wav_file=wave.open('/phat/to/new_file.wav', 'w')

wav_file.setparams((nchannels, sampwidth, sampling_rate, nframes, "NONE", "not compressed"))

for sample in s:
   wav_file.writeframes(struct.pack('h', int(sample)))

where nchannels is the number of channels, sampwidth is the number of bytes per samples, sampling_rate is the sampling rate, nframes is the total number of samples.其中nchannels是通道数, sampwidth是每个样本的字节数, sampling_rate是采样率, nframes是样本总数。

如果您要对波形数据执行传输,那么也许您应该使用SciPy ,特别是scipy.io.wavfile

I needed to read a 1-channel 24-bit WAV file.我需要读取 1 通道 24 位 WAV 文件。 The post above by Nak was very useful. Nak上面的帖子非常有用。 However, as mentioned above by basj 24-bit is not straightforward.但是,正如上面提到的basj 24-bit 并不简单。 I finally got it working using the following snippet:我终于使用以下代码段让它工作了:

from scipy.io import wavfile
TheFile = 'example24bit1channelFile.wav'
[fs, x] = wavfile.read(TheFile)

# convert the loaded data into a 24bit signal

nx = len(x)
ny = nx/3*4    # four 3-byte samples are contained in three int32 words

y = np.zeros((ny,), dtype=np.int32)    # initialise array

# build the data left aligned in order to keep the sign bit operational.
# result will be factor 256 too high

y[0:ny:4] = ((x[0:nx:3] & 0x000000FF) << 8) | \
  ((x[0:nx:3] & 0x0000FF00) << 8) | ((x[0:nx:3] & 0x00FF0000) << 8)
y[1:ny:4] = ((x[0:nx:3] & 0xFF000000) >> 16) | \
  ((x[1:nx:3] & 0x000000FF) << 16) | ((x[1:nx:3] & 0x0000FF00) << 16)
y[2:ny:4] = ((x[1:nx:3] & 0x00FF0000) >> 8) | \
  ((x[1:nx:3] & 0xFF000000) >> 8) | ((x[2:nx:3] & 0x000000FF) << 24)
y[3:ny:4] = (x[2:nx:3] & 0x0000FF00) | \
  (x[2:nx:3] & 0x00FF0000) | (x[2:nx:3] & 0xFF000000)

y = y/256   # correct for building 24 bit data left aligned in 32bit words

Some additional scaling is required if you need results between -1 and +1.如果您需要 -1 和 +1 之间的结果,则需要进行一些额外的缩放。 Maybe some of you out there might find this useful也许你们中的一些人可能会发现这很有用

if its just two files and the sample rate is significantly high, you could just interleave them.如果它只有两个文件并且采样率非常高,则可以将它们交错。

from scipy.io import wavfile
rate1,dat1 = wavfile.read(File1)
rate2,dat2 = wavfile.read(File2)

if len(dat2) > len(dat1):#swap shortest
    temp = dat2
    dat2 = dat1
    dat1 = temp

output = dat1
for i in range(len(dat2)/2): output[i*2]=dat2[i*2]

wavfile.write(OUTPUT,rate,dat)

你也可以使用简单的import wavio库你还需要对声音有一些基本的了解。

PyDub ( http://pydub.com/ ) has not been mentioned and that should be fixed. PyDub ( http://pydub.com/ ) 没有被提及,应该修复。 IMO this is the most comprehensive library for reading audio files in Python right now, although not without its faults. IMO 这是目前在 Python 中读取音频文件的最全面的库,尽管并非没有缺点。 Reading a wav file:读取 wav 文件:

from pydub import AudioSegment

audio_file = AudioSegment.from_wav('path_to.wav')
# or
audio_file = AudioSegment.from_file('path_to.wav')

# do whatever you want with the audio, change bitrate, export, convert, read info, etc.
# Check out the API docs http://pydub.com/

PS.附注。 The example is about reading a wav file, but PyDub can handle a lot of various formats out of the box.这个例子是关于读取 wav 文件的,但 PyDub 可以处理很多开箱即用的各种格式。 The caveat is that it's based on both native Python wav support and ffmpeg, so you have to have ffmpeg installed and a lot of the pydub capabilities rely on the ffmpeg version.需要注意的是,它基于原生 Python wav 支持和 ffmpeg,因此您必须安装 ffmpeg 并且许多 pydub 功能依赖于 ffmpeg 版本。 Usually if ffmpeg can do it, so can pydub (which is quite powerful).通常如果 ffmpeg 可以做到,那么 pydub 也可以(它非常强大)。

Non-disclaimer: I'm not related to the project, but I am a heavy user.非免责声明:我与该项目无关,但我是重度用户。

Here's a Python 3 solution using the built in wave module [1], that works for n channels, and 8,16,24... bits.这是使用内置波模块 [1] 的 Python 3 解决方案,适用于 n 个通道和 8,16,24... 位。

import sys
import wave

def read_wav(path):
    with wave.open(path, "rb") as wav:
        nchannels, sampwidth, framerate, nframes, _, _ = wav.getparams()
        print(wav.getparams(), "\nBits per sample =", sampwidth * 8)

        signed = sampwidth > 1  # 8 bit wavs are unsigned
        byteorder = sys.byteorder  # wave module uses sys.byteorder for bytes

        values = []  # e.g. for stereo, values[i] = [left_val, right_val]
        for _ in range(nframes):
            frame = wav.readframes(1)  # read next frame
            channel_vals = []  # mono has 1 channel, stereo 2, etc.
            for channel in range(nchannels):
                as_bytes = frame[channel * sampwidth: (channel + 1) * sampwidth]
                as_int = int.from_bytes(as_bytes, byteorder, signed=signed)
                channel_vals.append(as_int)
            values.append(channel_vals)

    return values, framerate

You can turn the result into a NumPy array.您可以将结果转换为 NumPy 数组。

import numpy as np

data, rate = read_wav(path)
data = np.array(data)

Note, I've tried to make it readable rather than fast.请注意,我试图使其可读而不是快速。 I found reading all the data at once was almost 2x faster.我发现一次读取所有数据的速度几乎快了 2 倍。 Eg例如

with wave.open(path, "rb") as wav:
    nchannels, sampwidth, framerate, nframes, _, _ = wav.getparams()
    all_bytes = wav.readframes(-1)

framewidth = sampwidth * nchannels
frames = (all_bytes[i * framewidth: (i + 1) * framewidth]
            for i in range(nframes))

for frame in frames:
    ...

Although python-soundfile is roughly 2 orders of magnitude faster (hard to approach this speed with pure CPython).尽管python-soundfile大约快了 2 个数量级(很难用纯 CPython 接近这个速度)。

[1] https://docs.python.org/3/library/wave.html [1] https://docs.python.org/3/library/wave.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM