简体   繁体   English

解释WAV数据

[英]Interpreting WAV Data

I'm trying to write a program to display PCM data. 我正在尝试编写一个程序来显示PCM数据。 I've been very frustrated trying to find a library with the right level of abstraction, but I've found the python wave library and have been using that. 我一直很沮丧地试图找到具有正确抽象级别的库,但是我已经找到了python wave库并一直在使用它。 However, I'm not sure how to interpret the data. 但是,我不确定如何解释数据。

The wave.getparams function returns (2 channels, 2 bytes, 44100 Hz, 96333 frames, No compression, No compression). wave.getparams函数返回(2个通道,2个字节,44100 Hz,96333帧,无压缩,无压缩)。 This all seems cheery, but then I tried printing a single frame:'\\xc0\\xff\\xd0\\xff' which is 4 bytes. 这一切看起来都很愉快,但是后来我尝试打印一个帧:'\\ xc0 \\ xff \\ xd0 \\ xff',这是4个字节。 I suppose it's possible that a frame is 2 samples, but the ambiguities do not end there. 我想一帧可能是2个样本,但是歧义并没有到此为止。

96333 frames * 2 samples/frame * (1/44.1k sec/sample) = 4.3688 seconds 96333帧* 2个样本/帧*(1 / 44.1k秒/样本)= 4.3688秒

However, iTunes reports the time as closer to 2 seconds and calculations based on file size and bitrate are in the ballpark of 2.7 seconds. 但是,iTunes报告的时间接近2秒,并且基于文件大小和比特率的计算约为2.7秒。 What's going on here? 这里发生了什么?

Additionally, how am I to know if the bytes are signed or unsigned? 另外,我如何知道字节是带符号的还是无符号的?

Many thanks! 非常感谢!

Thank you for your help! 谢谢您的帮助! I got it working and I'll post the solution here for everyone to use in case some other poor soul needs it: 我开始使用它,并将解决方案发布在这里,以供其他可怜的人需要时使用:

import wave
import struct

def pcm_channels(wave_file):
    """Given a file-like object or file path representing a wave file,
    decompose it into its constituent PCM data streams.

    Input: A file like object or file path
    Output: A list of lists of integers representing the PCM coded data stream channels
        and the sample rate of the channels (mixed rate channels not supported)
    """
    stream = wave.open(wave_file,"rb")

    num_channels = stream.getnchannels()
    sample_rate = stream.getframerate()
    sample_width = stream.getsampwidth()
    num_frames = stream.getnframes()

    raw_data = stream.readframes( num_frames ) # Returns byte data
    stream.close()

    total_samples = num_frames * num_channels

    if sample_width == 1: 
        fmt = "%iB" % total_samples # read unsigned chars
    elif sample_width == 2:
        fmt = "%ih" % total_samples # read signed 2 byte shorts
    else:
        raise ValueError("Only supports 8 and 16 bit audio formats.")

    integer_data = struct.unpack(fmt, raw_data)
    del raw_data # Keep memory tidy (who knows how big it might be)

    channels = [ [] for time in range(num_channels) ]

    for index, value in enumerate(integer_data):
        bucket = index % num_channels
        channels[bucket].append(value)

    return channels, sample_rate

"Two channels" means stereo, so it makes no sense to sum each channel's duration -- so you're off by a factor of two (2.18 seconds, not 4.37). “两个通道”表示立体声,因此,将每个通道的持续时间相加是没有意义的,因此您的距离减少了两倍(2.18秒,而不是4.37秒)。 As for signedness, as explained for example here , and I quote: 至于符号性,例如解释在这里 ,我引用:

8-bit samples are stored as unsigned bytes, ranging from 0 to 255. 16-bit samples are stored as 2's-complement signed integers, ranging from -32768 to 32767. 8位样本存储为0到255之间的无符号字节。16位样本存储为2的补码有符号整数,范围从-32768到32767。

This is part of the specs of the WAV format (actually of its superset RIFF) and thus not dependent on what library you're using to deal with a WAV file. 这是WAV格式(实际上是其超集RIFF)规范的一部分,因此不依赖于您用于处理WAV文件的库。

I know that an answer has already been accepted, but I did some things with audio a while ago and you have to unpack the wave doing something like this. 我知道答案已经被接受,但是前一阵子我在音频方面做了一些事情,您必须像这样做一些事情。

pcmdata = wave.struct.unpack("%dh"%(wavedatalength),wavedata)

Also, one package that I used was called PyAudio, though I still had to use the wave package with it. 另外,我使用的一个软件包称为PyAudio,尽管我仍然必须使用wave软件包。

每个样本为16位,有2个通道,因此帧占用4个字节

The duration is simply the number of frames divided by the number of frames per second. 持续时间就是帧数除以每秒的帧数。 From your data this is: 96333 / 44100 = 2.18 seconds . 根据您的数据,这是: 96333 / 44100 = 2.18 seconds

Building upon this answer , you can get a good performance boost by using numpy.fromstring or numpy.fromfile . 基于此答案 ,可以通过使用numpy.fromstringnumpy.fromfile获得良好的性能提升。 Also see this answer . 另请参阅此答案

Here is what I did: 这是我所做的:

def interpret_wav(raw_bytes, n_frames, n_channels, sample_width, interleaved = True):

    if sample_width == 1:
        dtype = np.uint8 # unsigned char
    elif sample_width == 2:
        dtype = np.int16 # signed 2-byte short
    else:
        raise ValueError("Only supports 8 and 16 bit audio formats.")

    channels = np.fromstring(raw_bytes, dtype=dtype)

    if interleaved:
        # channels are interleaved, i.e. sample N of channel M follows sample N of channel M-1 in raw data
        channels.shape = (n_frames, n_channels)
        channels = channels.T
    else:
        # channels are not interleaved. All samples from channel M occur before all samples from channel M-1
        channels.shape = (n_channels, n_frames)

    return channels

Assigning a new value to shape will throw an error if it requires data to be copied in memory. 如果需要将形状复制到形状中,则为形状赋一个新值将引发错误。 This is a good thing, since you want to use the data in place (using less time and memory overall). 这是一件好事,因为您想就地使用数据(总体上使用更少的时间和内存)。 The ndarray.T function also does not copy (ie returns a view) if possible, but I'm not sure how you ensure that it does not copy. 如果可能,ndarray.T函数也不会复制(即返回视图),但是我不确定如何确保它不会复制。

Reading directly from the file with np.fromfile will be even better, but you would have to skip the header using a custom dtype. 使用np.fromfile直接读取文件会更好,但是您必须使用自定义dtype跳过标头。 I haven't tried this yet. 我还没有尝试过。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM