为什么 PyAudio 会切断 NumPy 阵列的音频？

Question

I accidentally forgot to convert some NumPy arrays to bytes objects when using PyAudio, but to my surprise it still played audio, even if it sounded a bit off.在使用 PyAudio 时，我不小心忘记将一些 NumPy arrays 转换为字节对象，但令我惊讶的是，它仍然播放音频，即使听起来有点不对劲。 I wrote a little test script (see below) for playing 1 second of a 440Hz tone, and it seems like writing a NumPy array directly to a PyAudio Stream cuts that tone short.我写了一个小测试脚本（见下文）来播放 1 秒的 440Hz 音调，似乎直接将 NumPy 数组写入 PyAudio Stream缩短该音调。

Can anyone explain why this happens?谁能解释为什么会这样？ I thought a NumPy array was a contiguous sequence of bytes with some header information about its dtype and strides, so I would've predicted that PyAudio played the full second of the tone after some garbled audio from the header, not cut the tone off. I thought a NumPy array was a contiguous sequence of bytes with some header information about its dtype and strides, so I would've predicted that PyAudio played the full second of the tone after some garbled audio from the header, not cut the tone off.

# script segment
import pyaudio
import numpy as np
RATE = 48000

p = pyaudio.PyAudio()
stream = p.open(format = pyaudio.paFloat32, channels = 1, rate = RATE, output = True)

TONE = 440
SECONDS = 1
t = np.arange(0, 2*np.pi*TONE*SECONDS, 2*np.pi*TONE/RATE) 
sina = np.sin(t).astype(np.float32)
sinb = sina.tobytes()

# console commands segment
stream.write(sinb) # bytes object plays 1 second of 440Hz tone
stream.write(sina) # still plays 440Hz tone, but noticeably shorter than 1 second

Answer 1

The problem is more subtle than you describe.问题比你描述的更微妙。 Your first call is passing a bytes array of size 192,000.您的第一个调用是传递一个大小为 192,000 的字节数组。 The second call is passing a list of float32 values with size 48,000.第二个调用是传递一个大小为 48,000 的 float32 值列表。 pyaudio handles both of them, and passes the buffer to portaudio to be played. pyaudio处理它们，并将缓冲区传递给要播放的portaudio 。

However, when you opened pyaudio , you told it you were sending paFloat32 data, which has 4 bytes per sample.但是，当您打开pyaudio时，您告诉它您正在发送paFloat32数据，每个样本有 4 个字节。 The pyaudio write handler takes the length of the array you gave it, and divides by the number of channels times the sample size to determine how many audio samples there are. pyaudio write处理程序获取您给它的数组的长度，然后除以通道数乘以样本大小，以确定有多少音频样本。 In your second call, the length of the array is 48,000, which it divides by 4, and thereby tells portaudio "there are 12,000 samples here".在您的第二次调用中，数组的长度是 48,000，它除以 4，从而告诉portaudio “这里有 12,000 个样本”。

So, everyone understood the format, but were confused about the size.因此，每个人都理解格式，但对大小感到困惑。 If you change the second call to如果您将第二次通话更改为

stream.write(sina, 48000)

then no one has to guess, and it works perfectly fine.那么没有人必须猜测，它工作得很好。

为什么 PyAudio 会切断 NumPy 阵列的音频？

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-04-05 21:24:38

为什么 PyAudio 会切断 NumPy 阵列的音频？

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-04-05 21:24:38

解决方案1
2 已采纳 2021-04-05 21:24:38