简体   繁体   English

Python 音频帧音高变化

[英]Python Audio Frame Pitch Change

I'm attempting to use pyaudio to make a voice masker.我正在尝试使用 pyaudio 制作语音掩蔽器。 With the way I have it set up right now, the only thing I have to do is input the sound, change the pitch on the fly, and chunk it right back out.以我现在的设置方式,我唯一要做的就是输入声音,即时更改音高,然后将其分块。 The first and last part are working, and I think I'm getting close to changing pitch... emphasis on the "think".第一部分和最后一部分正在工作,我想我已经接近改变音高了……强调“思考”。

Unfortunately, I'm not too familiar with the type of data I'm working with and how exactly to manipulate it the way I want.不幸的是,我不太熟悉我正在使用的数据类型以及如何以我想要的方式操作它。 I've gone through the audioop documentation and havn't found what I needed (thought there are some things I could definately use in there).我已经浏览了 audioop 文档,但没有找到我需要的东西(认为有些东西我肯定可以在那里使用)。 I guess what I'm asking is...我想我要问的是...

How is the data formatted in these audio frames.这些音频帧中的数据是如何格式化的。

How can I change the pitch of a frame (if I can), or is it even close to working like that?我怎样才能改变框架的间距(如果可以的话),或者它是否接近那样工作?

import pyaudio
import sys
import numpy as np
import wave
import audioop
import struct

chunk = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 41000
RECORD_SECONDS = 5

p = pyaudio.PyAudio()

stream = p.open(format = FORMAT,
                channels = CHANNELS,
                rate = RATE,
                input = True,
                output = True,
                frames_per_buffer = chunk)

swidth = 2

print "* recording"



while(True):

    data = stream.read(chunk)
    data = np.array(wave.struct.unpack("%dh"%(len(data)/swidth), data))*2

    data = np.fft.rfft(data)
    #MANipulation
    data = np.fft.irfft(data)



    stream.write(data3, chunk)




print "* done"

stream.stop_stream()
stream.close()
p.terminate()

After the irfft line, and before the stream.write line, you need to convert the data back into 16-bit integers with a struct.pack call.irfft行之后和stream.write行之前,您需要使用struct.pack调用将数据转换回 16 位整数。

data = np.fft.irfft(data)
dataout = np.array(data*0.5, dtype='int16') #undo the *2 that was done at reading
chunkout = struct.pack("%dh"%(len(dataout)), *list(dataout)) #convert back to 16-bit data
stream.write(chunkout)

To change the pitch, you'll have to perform an FFT on a number of frames and then shift the data in frequency (move the data to different frequency bins) and perform an inverse FFT.要更改音高,您必须对多个帧执行FFT ,然后按频率移动数据(将数据移动到不同的频率区间)并执行逆 FFT。

If you don't mind the sound fragment getting longer while lowering the pitch (or higher when increasing the pitch), you could resample the frames.如果您不介意声音片段在降低音高时变长(或在增加音高时变高),您可以重新采样帧。 For instance, you could double each frame (insert a copy of each frame in the stream) thereby lowering the playback speed and the pitch.例如,您可以将每帧加倍(在流中插入每帧的副本),从而降低播放速度和音调。 You can then improve the audio quality by improving the resampling algorithm to use some sort of interpolation and/or filtering.然后,您可以通过改进重采样算法以使用某种插值和/或过滤来提高音频质量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM