简体   繁体   English

使用python进行实时音频信号处理

[英]Real-time audio signal processing using python

I have been trying to do real-time audio signal processing using 'pyAudio' module in python. 我一直在尝试使用python中的'pyAudio'模块进行实时音频信号处理。 What I did was a simple case of reading audio data from microphone and play it via headphones. 我所做的是从麦克风读取音频数据并通过耳机播放的简单案例。 I tried with the following code(both Python and Cython versions). 我尝试使用以下代码(Python和Cython版本)。 Thought it works but unfortunately it is stalls and not smooth enough. 认为它的工作原理,但不幸的是它是摊位,不够平稳。 How can I improve the code so that it will run smoothly. 如何改进代码以使其顺利运行。 My PC is i7, 8GB RAM. 我的电脑是i7,8GB RAM。

Python Version Python版本

import pyaudio
import numpy as np

RATE    = 16000
CHUNK   = 256

p               =   pyaudio.PyAudio()

player = p.open(format=pyaudio.paInt16, channels=1, rate=RATE, output=True, 
frames_per_buffer=CHUNK)
stream = p.open(format=pyaudio.paInt16, channels=1, rate=RATE, input=True, frames_per_buffer=CHUNK)

for i in range(int(20*RATE/CHUNK)): #do this for 10 seconds
player.write(np.fromstring(stream.read(CHUNK),dtype=np.int16))
stream.stop_stream()
stream.close()
p.terminate()

Cython Version Cython版本

import pyaudio
import numpy as np

cdef int RATE   = 16000
cdef int CHUNK  = 1024
cdef int i      
p               =   pyaudio.PyAudio()

player = p.open(format=pyaudio.paInt16, channels=1, rate=RATE, output=True, frames_per_buffer=CHUNK)
stream = p.open(format=pyaudio.paInt16, channels=1, rate=RATE, input=True, frames_per_buffer=CHUNK)

for i in range(500): #do this for 10 seconds
    player.write(np.fromstring(stream.read(CHUNK),dtype=np.int16))
stream.stop_stream()
stream.close()
p.terminate()

I believe you are missing CHUNK as second argument to player.write call. 我相信你缺少CHUNK作为player.write调用的第二个参数。

player.write(np.fromstring(stream.read(CHUNK),dtype=np.int16),CHUNK)

Also, not sure if its formatting error. 此外,不确定其格式错误。 But player.write needs to be tabbed into for loop 但是需要将player.writefor循环

And per pyaudio site you need to have RATE / CHUNK * RECORD_SECONDS and not RECORD *RATE/CHUNK as python executes * multiplication before / division. 并且每个pyaudio站点你需要有RATE / CHUNK * RECORD_SECONDS而不是RECORD *RATE/CHUNK因为python/ division之前执行* multiplication。

for i in range(int(20*RATE/CHUNK)): #do this for 10 seconds
    player.write(np.fromstring(stream.read(CHUNK),dtype=np.int16),CHUNK)

stream.stop_stream()
stream.close()
p.terminate()

Finally, you may want to increase rate to 44100 , CHUNK to 1024 and CHANNEL to 2 for better fidelity. 最后,您可能希望将rate提高到44100 ,将CHUNK1024 ,将CHANNEL2以获得更高的保真度。

The code below will take the default input device, and output what's recorded into the default output device. 下面的代码将采用默认输入设备,并输出记录到默认输出设备中的内容。

import PyAudio
import numpy as np

p = pyaudio.PyAudio()

CHANNELS = 2
RATE = 44100

def callback(in_data, frame_count, time_info, flag):
    # using Numpy to convert to array for processing
    # audio_data = np.fromstring(in_data, dtype=np.float32)
    return in_data, pyaudio.paContinue

stream = p.open(format=pyaudio.paFloat32,
                channels=CHANNELS,
                rate=RATE,
                output=True,
                input=True,
                stream_callback=callback)

stream.start_stream()

while stream.is_active():
    time.sleep(20)
    stream.stop_stream()
    print("Stream is stopped")

stream.close()

p.terminate()

This will run for 20 seconds and stop. 这将运行20秒并停止。 The method callback is where you can process the signal : audio_data = np.fromstring(in_data, dtype=np.float32) 方法回调是您可以处理信号的地方: audio_data = np.fromstring(in_data, dtype=np.float32)

return in_data is where you send back post-processed data to the output device. return in_data是将后处理数据发送回输出设备的位置。

Note chunk has a default argument of 1024 as noted in the PyAudio docs: http://people.csail.mit.edu/hubert/pyaudio/docs/#pyaudio.PyAudio.open 注意chunk的默认参数为1024,如PyAudio docs中所述: http ://people.csail.mit.edu/hubert/pyaudio/docs/#pyaudio.PyAudio.open

I am working on a similar project. 我正在做一个类似的项目。 I modified your code and the stalls now are gone. 我修改了你的代码,现在摊位都没了。 The bigger the chunk the bigger the delay. 块越大,延迟越大。 That is why I kept it low. 这就是我保持低调的原因。

import pyaudio
import numpy as np

CHUNK = 2**5
RATE = 44100
LEN = 10

p = pyaudio.PyAudio()

stream = p.open(format=pyaudio.paInt16, channels=1, rate=RATE, input=True, frames_per_buffer=CHUNK)
player = p.open(format=pyaudio.paInt16, channels=1, rate=RATE, output=True, frames_per_buffer=CHUNK)


for i in range(int(LEN*RATE/CHUNK)): #go for a LEN seconds
    data = np.fromstring(stream.read(CHUNK),dtype=np.int16)
    player.write(data,CHUNK)


stream.stop_stream()
stream.close()
p.terminate()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM