简体   繁体   English

使用PyAudio录制特定时间的音频?

[英]Recording audio for specific amount of time with PyAudio?

I am trying to learn about audio capture/recording using Python and in this case PyAudio. 我正在尝试使用Python(在本例中为PyAudio)来学习音频捕获/记录。 I am taking a look at a few examples and came across this one: 我看一些例子,并发现了这个例子:

import pyaudio
import wave

CHUNK = 2
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 3
WAVE_OUTPUT_FILENAME = "output.wav"

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

print("* recording")

frames = []

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)

print(int(RATE / CHUNK * RECORD_SECONDS))

print("* done recording")

stream.stop_stream()
stream.close()
p.terminate()

wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()

I think I have a rough understanding of what CHUNK, FORMAT, CHANNELS and RATE all mean and do, but I don't understand how recording for specific amounts of time works. 我想我对CHUNK,FORMAT,CHANNELS和RATE的含义和作用有一个大概的了解,但我不了解在特定时间段内录制的工作方式。 If I was to change the value of CHUNK from 2 to 4, the value of int(RATE / CHUNK * RECORD_SECONDS) would be halved. 如果我将CHUNK的值从2更改为4,则int(RATE / CHUNK * RECORD_SECONDS)的值将减半。 But then if I was to run the code, the recording will still occur for the 3 seconds specified. 但是,如果我要运行代码,则录制仍将在指定的3秒钟内进行。

Ultimately, how can this for loop execute in the same amount of time when the range is halved? 最终,当范围减半时,该for循环如何在相同的时间内执行?

Sorry if I don't make sense, it feels like a stupid question. 抱歉,如果我没有道理,那感觉就像是一个愚蠢的问题。

Edit: So changing the number of samples read manually, without changing the range the for loop is iterating over (so is constant at range(0, 60000) but data = sample.read(CHUNK) varies), does change the time taken to record. 编辑:因此,更改手动读取的样本数量,而不更改for循环的范围(因此,范围(0,60000)不变,但data = sample.read(CHUNK)不变),确实会更改记录。 That means doubling the samples read each iteration doubles the time taken and so does that mean it just takes twice as long to process the data? 这意味着将每次迭代读取的样本加倍会使花费的时间加倍,这是否意味着处理数据所需的时间只是原来的两倍? But if so, wouldn't the time taken vary on different computers depending on the processing power available? 但是,如果是这样,根据可用的处理能力,不同计算机上花费的时间是否会有所不同?

CHUNK is the number of samples in a block of data. CHUNKCHUNK中样本的数量。 I would call this "block size". 我将其称为“块大小”。 Sound cards and sound drivers typically don't process one sample after the other but they use, well, chunks. 声卡和声音驱动程序通常不会一个接一个地处理一个样本,但是它们会使用很多块。 The block size of those is typically a few hundred samples, eg 512 or 1024 samples. 这些块的块大小通常是几百个样本,例如512或1024个样本。 Only if you need very low latencies, you should try to use smaller block sizes, like 64 or 32 samples. 仅当需要非常低的延迟时,才应尝试使用较小的块大小,例如64或32个样本。 A block size of 2 typically doesn't work well. 块大小通常为2,效果不佳。

RATE is the sampling rate, ie the number of samples per seconds. RATE是采样率,即每秒的采样数。 44100 Hertz is a typical sampling rate from the era of CDs, nowadays you'll also often see 48000 Hertz. 44100赫兹是CD时代的典型采样率,如今您经常会看到48000赫兹。

The for -loop in your example is reading blocks of data (or "chunks" if you prefer) from the audio hardware. 您的示例中的for循环是从音频硬件读取数据块(如果需要,还可以读取“块”)。 If you want to record 3 seconds of audio, you'll need to record 3 * RATE samples. 如果要录制3秒钟的音频,则需要录制3 * RATE采样。 To get the number of blocks you'll have to divide that by the block size CHUNK . 要获得数,您必须将其除以块大小CHUNK

If you change the value of CHUNK , this doesn't change the duration of the whole recording (apart from some truncation done by int() ), but it changes the number of times the for -loop is running. 如果更改CHUNK的值,则不会更改整个记录的持续时间(除了通过int()进行的一些截断),但是会更改for循环运行的次数。

If you are willing to use NumPy, there is a much simpler way to record a few seconds of audio into a WAV file: Use the sounddevice module to record the audio data and the soundfile module to save it to a WAV file: 如果您愿意使用NumPy的,还有一个更简单的音频几秒钟记录到WAV文件的方法:使用sounddevice模块记录的音频数据和音效档模块将其保存为WAV文件:

import sounddevice as sd
import soundfile as sf

samplerate = 44100  # Hertz
duration = 3  # seconds
filename = 'output.wav'

mydata = sd.rec(int(samplerate * duration), samplerate=samplerate,
                channels=2, blocking=True)
sf.write(filename, mydata, samplerate)

BTW, you don't need to specify the block size if you have no reason for it. 顺便说一句,如果没有理由,则无需指定块大小。 The underlying library (PortAudio) will automatically choose one for you. 基础库(PortAudio)将自动为您选择一个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM