简体   繁体   中英

Python Audio Frame Pitch Change

I'm attempting to use pyaudio to make a voice masker. With the way I have it set up right now, the only thing I have to do is input the sound, change the pitch on the fly, and chunk it right back out. The first and last part are working, and I think I'm getting close to changing pitch... emphasis on the "think".

Unfortunately, I'm not too familiar with the type of data I'm working with and how exactly to manipulate it the way I want. I've gone through the audioop documentation and havn't found what I needed (thought there are some things I could definately use in there). I guess what I'm asking is...

How is the data formatted in these audio frames.

How can I change the pitch of a frame (if I can), or is it even close to working like that?

import pyaudio
import sys
import numpy as np
import wave
import audioop
import struct

chunk = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 41000
RECORD_SECONDS = 5

p = pyaudio.PyAudio()

stream = p.open(format = FORMAT,
                channels = CHANNELS,
                rate = RATE,
                input = True,
                output = True,
                frames_per_buffer = chunk)

swidth = 2

print "* recording"



while(True):

    data = stream.read(chunk)
    data = np.array(wave.struct.unpack("%dh"%(len(data)/swidth), data))*2

    data = np.fft.rfft(data)
    #MANipulation
    data = np.fft.irfft(data)



    stream.write(data3, chunk)




print "* done"

stream.stop_stream()
stream.close()
p.terminate()

After the irfft line, and before the stream.write line, you need to convert the data back into 16-bit integers with a struct.pack call.

data = np.fft.irfft(data)
dataout = np.array(data*0.5, dtype='int16') #undo the *2 that was done at reading
chunkout = struct.pack("%dh"%(len(dataout)), *list(dataout)) #convert back to 16-bit data
stream.write(chunkout)

To change the pitch, you'll have to perform an FFT on a number of frames and then shift the data in frequency (move the data to different frequency bins) and perform an inverse FFT.

If you don't mind the sound fragment getting longer while lowering the pitch (or higher when increasing the pitch), you could resample the frames. For instance, you could double each frame (insert a copy of each frame in the stream) thereby lowering the playback speed and the pitch. You can then improve the audio quality by improving the resampling algorithm to use some sort of interpolation and/or filtering.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM