解釋WAV數據

Question

我正在嘗試編寫一個程序來顯示PCM數據。 我一直很沮喪地試圖找到具有正確抽象級別的庫，但是我已經找到了python wave庫並一直在使用它。 但是，我不確定如何解釋數據。

wave.getparams函數返回（2個通道，2個字節，44100 Hz，96333幀，無壓縮，無壓縮）。 這一切看起來都很愉快，但是后來我嘗試打印一個幀：'\\ xc0 \\ xff \\ xd0 \\ xff'，這是4個字節。 我想一幀可能是2個樣本，但是歧義並沒有到此為止。

96333幀* 2個樣本/幀*（1 / 44.1k秒/樣本）= 4.3688秒

但是，iTunes報告的時間接近2秒，並且基於文件大小和比特率的計算約為2.7秒。 這里發生了什么？

另外，我如何知道字節是帶符號的還是無符號的？

非常感謝！

Answer 1

謝謝您的幫助！ 我開始使用它，並將解決方案發布在這里，以供其他可憐的人需要時使用：

import wave
import struct

def pcm_channels(wave_file):
    """Given a file-like object or file path representing a wave file,
    decompose it into its constituent PCM data streams.

    Input: A file like object or file path
    Output: A list of lists of integers representing the PCM coded data stream channels
        and the sample rate of the channels (mixed rate channels not supported)
    """
    stream = wave.open(wave_file,"rb")

    num_channels = stream.getnchannels()
    sample_rate = stream.getframerate()
    sample_width = stream.getsampwidth()
    num_frames = stream.getnframes()

    raw_data = stream.readframes( num_frames ) # Returns byte data
    stream.close()

    total_samples = num_frames * num_channels

    if sample_width == 1: 
        fmt = "%iB" % total_samples # read unsigned chars
    elif sample_width == 2:
        fmt = "%ih" % total_samples # read signed 2 byte shorts
    else:
        raise ValueError("Only supports 8 and 16 bit audio formats.")

    integer_data = struct.unpack(fmt, raw_data)
    del raw_data # Keep memory tidy (who knows how big it might be)

    channels = [ [] for time in range(num_channels) ]

    for index, value in enumerate(integer_data):
        bucket = index % num_channels
        channels[bucket].append(value)

    return channels, sample_rate

Answer 2

“兩個通道”表示立體聲，因此，將每個通道的持續時間相加是沒有意義的，因此您的距離減少了兩倍（2.18秒，而不是4.37秒）。 至於符號性，例如解釋在這里，我引用：

8位樣本存儲為0到255之間的無符號字節。16位樣本存儲為2的補碼有符號整數，范圍從-32768到32767。

這是WAV格式（實際上是其超集RIFF）規范的一部分，因此不依賴於您用於處理WAV文件的庫。

Answer 3

我知道答案已經被接受，但是前一陣子我在音頻方面做了一些事情，您必須像這樣做一些事情。

pcmdata = wave.struct.unpack("%dh"%(wavedatalength),wavedata)

另外，我使用的一個軟件包稱為PyAudio，盡管我仍然必須使用wave軟件包。

Answer 4

每個樣本為16位，有2個通道，因此幀占用4個字節

Answer 5

持續時間就是幀數除以每秒的幀數。 根據您的數據，這是： 96333 / 44100 = 2.18 seconds 。

Answer 6

基於此答案，可以通過使用numpy.fromstring或numpy.fromfile獲得良好的性能提升。 另請參閱此答案。

這是我所做的：

def interpret_wav(raw_bytes, n_frames, n_channels, sample_width, interleaved = True):

    if sample_width == 1:
        dtype = np.uint8 # unsigned char
    elif sample_width == 2:
        dtype = np.int16 # signed 2-byte short
    else:
        raise ValueError("Only supports 8 and 16 bit audio formats.")

    channels = np.fromstring(raw_bytes, dtype=dtype)

    if interleaved:
        # channels are interleaved, i.e. sample N of channel M follows sample N of channel M-1 in raw data
        channels.shape = (n_frames, n_channels)
        channels = channels.T
    else:
        # channels are not interleaved. All samples from channel M occur before all samples from channel M-1
        channels.shape = (n_channels, n_frames)

    return channels

如果需要將形狀復制到形狀中，則為形狀賦一個新值將引發錯誤。 這是一件好事，因為您想就地使用數據（總體上使用更少的時間和內存）。 如果可能，ndarray.T函數也不會復制（即返回視圖），但是我不確定如何確保它不會復制。

使用np.fromfile直接讀取文件會更好，但是您必須使用自定義dtype跳過標頭。 我還沒有嘗試過。

解釋WAV數據

問題描述

6 個解決方案

解決方案1
19 2010-02-09 06:18:05

解決方案2
9 已采納 2010-02-09 05:15:36

解決方案3
4 2010-02-09 05:52:49

解決方案4
2 2010-02-09 05:17:05

解決方案5
2 2010-02-09 05:21:32

解決方案6
2 2015-07-25 11:16:16

解釋WAV數據

問題描述

6 個解決方案

解決方案1 19 2010-02-09 06:18:05

解決方案2 9 已采納 2010-02-09 05:15:36

解決方案3 4 2010-02-09 05:52:49

解決方案4 2 2010-02-09 05:17:05

解決方案5 2 2010-02-09 05:21:32

解決方案6 2 2015-07-25 11:16:16

解決方案1
19 2010-02-09 06:18:05

解決方案2
9 已采納 2010-02-09 05:15:36

解決方案3
4 2010-02-09 05:52:49

解決方案4
2 2010-02-09 05:17:05

解決方案5
2 2010-02-09 05:21:32

解決方案6
2 2015-07-25 11:16:16