[英]slow down spoken audio (not from mp3/wav) using python
我需要減慢通過麥克風捕獲的短時語音音頻,然后在 python 腳本中實時播放。 我可以在不改變速度的情況下使用輸入和 output stream 使用 PyAudio 捕獲和播放音頻,但我不知道如何減慢速度。
我已經看到這篇使用pydub的帖子對文件中的音頻做了類似的事情,但不知道如何為我的目的修改它。
只是為了強調問題標題中的關鍵點, “(不是來自 mp3/wav 或任何其他文件類型)” ,因為我想以短時間實時執行此操作,理想情況下 <= ~0.1s所以只想處理數據從 PyAudio stream 讀入。
有沒有使用 pydub 經驗的人知道它是否可以滿足我的需要?
注意,我意識到 output 會越來越落后,並且可能存在緩沖問題,但是我只是在長達 30 秒的短暫時間內執行此操作,並且只想將語音減慢約 10%。
所以事實證明它非常非常簡單。
Once I looked into the pydub
and pyaudio
code bases i realised that by simply specifying a lower value for the ' rate
' parameter on the output audio stream (speaker) compared with the input audio stream (mic) the stream.write() function would替我處理。
我一直期望需要對原始數據進行物理操作才能將數據轉換為更大的緩沖區。
這是一個簡單的例子:
import pyaudio
FORMAT = pyaudio.paInt16
CHANNELS = 1
FRAME_RATE = 44100
CHUNK = 1024*4
# simply modify the value for the 'rate' parameter to change the playback speed
# <1 === slow down; >1 === speed up
FRAMERATE_OFFSET = 0.8
audio = pyaudio.PyAudio()
#output stream
stream_out = audio.open(format=FORMAT,
channels=CHANNELS,
rate= int(FRAME_RATE * FRAMERATE_OFFSET),
output=True)
# open input steam to start recording mic audio
stream_in = audio.open(format=FORMAT,
channels=CHANNELS,
rate=FRAME_RATE,
input=True)
for i in range(1):
# modify the chunk multiplier below to captyre longer time durations
data = stream_in.read(CHUNK*25)
stream_out.write(data)
stream_out.stop_stream()
stream_out.close()
audio.terminate()
為了實現這個操作,我需要設置一個共享的 memory 數據緩沖區並設置一個子進程來處理 output,這樣我就不會錯過輸入信號中的任何重要內容。
這就是我所做的。
import wave
channels = 1
swidth = 2
multiplier = 0.2
spf = wave.open('flute-a4.wav', 'rb')
fr=spf.getframerate() # frame rate
signal = spf.readframes(-1)
wf = wave.open('ship.wav', 'wb')
wf.setnchannels(channels)
wf.setsampwidth(swidth)
wf.setframerate(fr*multiplier)
wf.writeframes(signal)
wf.close()
我使用了這個repo中的長笛。
正如評論中提到的,通過簡單地增加或減少采樣頻率/幀速率,您可以加速減速音頻。 盡管如果您打算通過麥克風實時進行錄制,其中一個想法是在幾秒鍾內錄制,播放放慢的音頻,然后再次進行錄制。
這是一個使用 sounddevice 的示例,這基本上是我在此處回答的輕微修改。 我們循環錄制音頻 4 秒 3 次,並立即播放幀速率偏移(> 1 表示加速,< 1 表示減速)。 在我們開始新塊之前添加了 1 秒的時間延遲以完成音頻播放。
import sounddevice as sd
import numpy as np
import scipy.io.wavfile as wav
import time
fs=44100
duration = 4 # seconds
#fs_offset = 1.3 #speedup
fs_offset = 0.8 #speedup slow down
for count in range(1,4):
myrecording = sd.rec(duration * fs, samplerate=fs, channels=2, dtype='float64')
print "Recording Audio chunk {} for {} seconds".format(count, duration)
sd.wait()
print "Recording complete, Playing chunk {} with offset {} ".format(count, fs_offset)
sd.play(myrecording, fs * fs_offset)
sd.wait()
print "Playing chunk {} Complete".format(count)
time.sleep(1)
Output:
$python sdaudio.py
Recording Audio chunk 1 for 4 seconds
Recording complete, Playing chunk 1 with offset 0.8
Playing chunk 1 Complete
Recording Audio chunk 2 for 4 seconds
Recording complete, Playing chunk 2 with offset 0.8
Playing chunk 2 Complete
Recording Audio chunk 3 for 4 seconds
Recording complete, Playing chunk 3 with offset 0.8
Playing chunk 3 Complete
這是一個使用 PyAudio 從麥克風錄制和 pydub 進行播放的示例。 盡管您也可以使用 pyaudio 阻塞線功能來修改傳出音頻。 我使用了 pydub,因為您提到了基於 pydub 的解決方案。 這是來自這里的代碼模式。
import pyaudio
import wave
from pydub import AudioSegment
from pydub.playback import play
import time
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 4
#FRAMERATE_OFFSET = 1.4 #speedup
FRAMERATE_OFFSET = 0.7 #slowdown
WAVE_OUTPUT_FILENAME = "file.wav"
def get_audio():
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
#save to file
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE * FRAMERATE_OFFSET)
waveFile.writeframes(b''.join(frames))
waveFile.close()
for count in range(1,4):
print "recording segment {} ....".format(count)
frame_array = get_audio()
print "Playing segment {} .... at offset {}".format(count, FRAMERATE_OFFSET)
audio_chunk = AudioSegment.from_wav(WAVE_OUTPUT_FILENAME)
print "Finished playing segment {} .... at offset {}".format(count, FRAMERATE_OFFSET)
play(audio_chunk)
time.sleep(1)
Output:
$python slowAudio.py
recording segment 1 ....
Playing segment 1 .... at offset 0.7
Finished playing segment 1 .... at offset 0.7
recording segment 2 ....
Playing segment 2 .... at offset 0.7
Finished playing segment 2 .... at offset 0.7
recording segment 3 ....
Playing segment 3 .... at offset 0.7
這個問題已經在這里回答了。
from pydub import AudioSegment
sound = AudioSegment.from_file(…)
def speed_change(sound, speed=1.0):
# Manually override the frame_rate. This tells the computer how many
# samples to play per second
sound_with_altered_frame_rate = sound._spawn(sound.raw_data, overrides={
"frame_rate": int(sound.frame_rate * speed)
})
# convert the sound with altered frame rate to a standard frame rate
# so that regular playback programs will work right. They often only
# know how to play audio at standard frame rate (like 44.1k)
return sound_with_altered_frame_rate.set_frame_rate(sound.frame_rate)
slow_sound = speed_change(sound, 0.75)
fast_sound = speed_change(sound, 2.0)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.