简体   繁体   中英

Recording audio for specific amount of time with PyAudio?

I am trying to learn about audio capture/recording using Python and in this case PyAudio. I am taking a look at a few examples and came across this one:

import pyaudio
import wave

CHUNK = 2
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 3
WAVE_OUTPUT_FILENAME = "output.wav"

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

print("* recording")

frames = []

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)

print(int(RATE / CHUNK * RECORD_SECONDS))

print("* done recording")

stream.stop_stream()
stream.close()
p.terminate()

wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()

I think I have a rough understanding of what CHUNK, FORMAT, CHANNELS and RATE all mean and do, but I don't understand how recording for specific amounts of time works. If I was to change the value of CHUNK from 2 to 4, the value of int(RATE / CHUNK * RECORD_SECONDS) would be halved. But then if I was to run the code, the recording will still occur for the 3 seconds specified.

Ultimately, how can this for loop execute in the same amount of time when the range is halved?

Sorry if I don't make sense, it feels like a stupid question.

Edit: So changing the number of samples read manually, without changing the range the for loop is iterating over (so is constant at range(0, 60000) but data = sample.read(CHUNK) varies), does change the time taken to record. That means doubling the samples read each iteration doubles the time taken and so does that mean it just takes twice as long to process the data? But if so, wouldn't the time taken vary on different computers depending on the processing power available?

CHUNK is the number of samples in a block of data. I would call this "block size". Sound cards and sound drivers typically don't process one sample after the other but they use, well, chunks. The block size of those is typically a few hundred samples, eg 512 or 1024 samples. Only if you need very low latencies, you should try to use smaller block sizes, like 64 or 32 samples. A block size of 2 typically doesn't work well.

RATE is the sampling rate, ie the number of samples per seconds. 44100 Hertz is a typical sampling rate from the era of CDs, nowadays you'll also often see 48000 Hertz.

The for -loop in your example is reading blocks of data (or "chunks" if you prefer) from the audio hardware. If you want to record 3 seconds of audio, you'll need to record 3 * RATE samples. To get the number of blocks you'll have to divide that by the block size CHUNK .

If you change the value of CHUNK , this doesn't change the duration of the whole recording (apart from some truncation done by int() ), but it changes the number of times the for -loop is running.

If you are willing to use NumPy, there is a much simpler way to record a few seconds of audio into a WAV file: Use the sounddevice module to record the audio data and the soundfile module to save it to a WAV file:

import sounddevice as sd
import soundfile as sf

samplerate = 44100  # Hertz
duration = 3  # seconds
filename = 'output.wav'

mydata = sd.rec(int(samplerate * duration), samplerate=samplerate,
                channels=2, blocking=True)
sf.write(filename, mydata, samplerate)

BTW, you don't need to specify the block size if you have no reason for it. The underlying library (PortAudio) will automatically choose one for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM