解释WAV数据

Question

我正在尝试编写一个程序来显示PCM数据。 我一直很沮丧地试图找到具有正确抽象级别的库，但是我已经找到了python wave库并一直在使用它。 但是，我不确定如何解释数据。

wave.getparams函数返回（2个通道，2个字节，44100 Hz，96333帧，无压缩，无压缩）。 这一切看起来都很愉快，但是后来我尝试打印一个帧：'\\ xc0 \\ xff \\ xd0 \\ xff'，这是4个字节。 我想一帧可能是2个样本，但是歧义并没有到此为止。

96333帧* 2个样本/帧*（1 / 44.1k秒/样本）= 4.3688秒

但是，iTunes报告的时间接近2秒，并且基于文件大小和比特率的计算约为2.7秒。 这里发生了什么？

另外，我如何知道字节是带符号的还是无符号的？

非常感谢！

Answer 1

谢谢您的帮助！ 我开始使用它，并将解决方案发布在这里，以供其他可怜的人需要时使用：

import wave
import struct

def pcm_channels(wave_file):
    """Given a file-like object or file path representing a wave file,
    decompose it into its constituent PCM data streams.

    Input: A file like object or file path
    Output: A list of lists of integers representing the PCM coded data stream channels
        and the sample rate of the channels (mixed rate channels not supported)
    """
    stream = wave.open(wave_file,"rb")

    num_channels = stream.getnchannels()
    sample_rate = stream.getframerate()
    sample_width = stream.getsampwidth()
    num_frames = stream.getnframes()

    raw_data = stream.readframes( num_frames ) # Returns byte data
    stream.close()

    total_samples = num_frames * num_channels

    if sample_width == 1: 
        fmt = "%iB" % total_samples # read unsigned chars
    elif sample_width == 2:
        fmt = "%ih" % total_samples # read signed 2 byte shorts
    else:
        raise ValueError("Only supports 8 and 16 bit audio formats.")

    integer_data = struct.unpack(fmt, raw_data)
    del raw_data # Keep memory tidy (who knows how big it might be)

    channels = [ [] for time in range(num_channels) ]

    for index, value in enumerate(integer_data):
        bucket = index % num_channels
        channels[bucket].append(value)

    return channels, sample_rate

Answer 2

“两个通道”表示立体声，因此，将每个通道的持续时间相加是没有意义的，因此您的距离减少了两倍（2.18秒，而不是4.37秒）。 至于符号性，例如解释在这里，我引用：

8位样本存储为0到255之间的无符号字节。16位样本存储为2的补码有符号整数，范围从-32768到32767。

这是WAV格式（实际上是其超集RIFF）规范的一部分，因此不依赖于您用于处理WAV文件的库。

Answer 3

我知道答案已经被接受，但是前一阵子我在音频方面做了一些事情，您必须像这样做一些事情。

pcmdata = wave.struct.unpack("%dh"%(wavedatalength),wavedata)

另外，我使用的一个软件包称为PyAudio，尽管我仍然必须使用wave软件包。

Answer 4

每个样本为16位，有2个通道，因此帧占用4个字节

Answer 5

持续时间就是帧数除以每秒的帧数。 根据您的数据，这是： 96333 / 44100 = 2.18 seconds 。

Answer 6

基于此答案，可以通过使用numpy.fromstring或numpy.fromfile获得良好的性能提升。 另请参阅此答案。

这是我所做的：

def interpret_wav(raw_bytes, n_frames, n_channels, sample_width, interleaved = True):

    if sample_width == 1:
        dtype = np.uint8 # unsigned char
    elif sample_width == 2:
        dtype = np.int16 # signed 2-byte short
    else:
        raise ValueError("Only supports 8 and 16 bit audio formats.")

    channels = np.fromstring(raw_bytes, dtype=dtype)

    if interleaved:
        # channels are interleaved, i.e. sample N of channel M follows sample N of channel M-1 in raw data
        channels.shape = (n_frames, n_channels)
        channels = channels.T
    else:
        # channels are not interleaved. All samples from channel M occur before all samples from channel M-1
        channels.shape = (n_channels, n_frames)

    return channels

如果需要将形状复制到形状中，则为形状赋一个新值将引发错误。 这是一件好事，因为您想就地使用数据（总体上使用更少的时间和内存）。 如果可能，ndarray.T函数也不会复制（即返回视图），但是我不确定如何确保它不会复制。

使用np.fromfile直接读取文件会更好，但是您必须使用自定义dtype跳过标头。 我还没有尝试过。

解释WAV数据

问题描述

6 个解决方案

解决方案1
19 2010-02-09 06:18:05

解决方案2
9 已采纳 2010-02-09 05:15:36

解决方案3
4 2010-02-09 05:52:49

解决方案4
2 2010-02-09 05:17:05

解决方案5
2 2010-02-09 05:21:32

解决方案6
2 2015-07-25 11:16:16

解释WAV数据

问题描述

6 个解决方案

解决方案1 19 2010-02-09 06:18:05

解决方案2 9 已采纳 2010-02-09 05:15:36

解决方案3 4 2010-02-09 05:52:49

解决方案4 2 2010-02-09 05:17:05

解决方案5 2 2010-02-09 05:21:32

解决方案6 2 2015-07-25 11:16:16

解决方案1
19 2010-02-09 06:18:05

解决方案2
9 已采纳 2010-02-09 05:15:36

解决方案3
4 2010-02-09 05:52:49

解决方案4
2 2010-02-09 05:17:05

解决方案5
2 2010-02-09 05:21:32

解决方案6
2 2015-07-25 11:16:16