简体   繁体   中英

Python pitch shifting with Windows

I am attempting to create a sort of autotune/pitch correction algorithm in Python. I am able to detect pitches per a rectangular window size, and tried shifting the pitch of each window (of size 512) by 2 semitones to test if this method would actually work. Doing this, however, creates a huge amount of feedback in the returned audio. I assume this is due to the presence of a rectangular window, instead of a hanning window. My question is, how do I implement pitch correction on bins while also removing the feedback?

Code:

import librosa
import librosa.display
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipd
import random

samples, sr = librosa.load('my_raw_vocals.wav',sr=None)

def manipulate(data, sampling_rate, pitch_factor):
    return librosa.effects.pitch_shift(data, sampling_rate, pitch_factor)

def block(array, size):
  c = []
  array = list(array)
  for i in range(0,len(array) // size):
    frames = array[i*size:i*size+size]
    np_frames = np.asarray(frames)
    c.append(np_frames)
  perc_remainder = (len(array) / size) - (len(array) // size)
  if perc_remainder != 0:
    n_remainder = perc_remainder * size
    remainders = np.asarray(array[-1 * int(n_remainder):])
    c.append(remainders)
  return c

block512 = block(samples,512)

summation = []

rate = 2
for i in block512:
  altered_frame = manipulate(i,sr,r)
  summation.append(altered_frame)

frame_pitched512 = np.ndarray.flatten(np.asarray(summation))
frame_pitched512 = [val for sublist in summation for val in sublist]
frame_pitched512 = np.asarray(frame_pitched512)
ipd.Audio(frame_pitched512, rate=sr)

Actual audio and code is here:

https://colab.research.google.com/drive/1cpRhPpvXY_9XZidjOLKk_wW15EnkqLEX?usp=sharing

There should be a few things that I noticed about the total code:

1- Hamming window is a better choice in sound processing and rectangular windows is the worst by no doubt.

2- There should be a normalization array/max(abs(array)) in order to receive an acceptable answer

3- You should apply pre-emphasis for speech (just for speech)

4- Another important thing is to use robust pitch detection which can be implemented by many ways that is shown below:

It is odd that pitch changes more than 30% per frame so these pitches that are too big or too large are actually 2x or 0.5x of the actual pitch. So you should apply pitch tracking to ensure true results.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM