简体   繁体   English

Python中的自相关代码会产生错误(吉他音高检测)

[英]Autocorrelation code in Python produces errors (guitar pitch detection)

This link provides code for an autocorrelation-based pitch detection algorithm. 该链接提供了基于自相关的基音检测算法的代码。 I am using it to detect pitches in simple guitar melodies. 我用它来检测简单吉他旋律的音高。

In general, it produces very good results. 通常,它会产生很好的结果。 For example, for the melody C4, C#4, D4, D#4, E4 it outputs: 例如,对于旋律C4,C#4,D4,D#4,E4,它输出:

262.743653536
272.144441273
290.826273006
310.431336809
327.094621169

Which correlates to the correct notes. 与正确的音符相关。

However, in some cases like this audio file (E4, F4, F#4, G4, G#4, A4, A#4, B4) it produces errors: 但是,在某些情况下,例如音频文件(E4,F4,F#4,G4,G#4,A4,A#4,B4),它会产生错误:

325.861452246
13381.6439242
367.518651703
391.479384923
414.604661221
218.345286173
466.503751322
244.994090035

More specifically, there are three errors here: 13381Hz is wrongly detected instead of F4 (~350Hz) (weird error), and also 218Hz instead of A4 (440Hz) and 244Hz instead of B4 (~493Hz), which are octave errors. 更具体地说,这里存在三个错误:错误地检测到13381Hz而不是F4(〜350Hz)(奇怪的错误),还有218Hz而不是A4(440Hz)和244Hz而不是B4(〜493Hz),它们是八度音阶错误。

I assume the two errors are caused by something different? 我假设这两个错误是由不同的原因引起的? Here is the code: 这是代码:

slices = segment_signal(y, sr)
for segment in slices:
  pitch = freq_from_autocorr(segment, sr)
  print pitch

def segment_signal(y, sr, onset_frames=None, offset=0.1):
  if (onset_frames == None):
    onset_frames = remove_dense_onsets(librosa.onset.onset_detect(y=y, sr=sr))

  offset_samples = int(librosa.time_to_samples(offset, sr))

  print onset_frames

  slices = np.array([y[i : i + offset_samples] for i
    in librosa.frames_to_samples(onset_frames)])

  return slices

You can see the freq_from_autocorr function in the first link above. 您可以在上面的第一个链接中看到freq_from_autocorr函数。

The only think that I have changed is this line: 我唯一更改的想法是此行:

corr = corr[len(corr)/2:]

Which I have replaced with: 我已替换为:

corr = corr[int(len(corr)/2):]

UPDATE : 更新

I noticed the smallest the offset I use (the smallest the signal segment I use to detect each pitch), the more high-frequency (10000+ Hz) errors I get. 我注意到我使用的offset最小(用于检测每个音高的信号段最小),我得到的高频(10000+ Hz)错误也更多。

Specifically, I noticed that the part that goes differently in those cases (10000+ Hz) is the calculation of the i_peak value. 具体来说,我注意到在这些情况下(10000+ Hz)不同的部分是i_peak值的计算。 When in cases with no error it is in the range of 50-150, in the case of the error it is 3-5. 如果没有错误,则在50-150范围内;如果出现错误,则为3-5。

The autocorrelation function in the code snippet that you linked is not particularly robust. 您链接的代码段中的自相关函数不是特别可靠。 In order to get the correct result, it needs to locate the first peak on the left hand side of the autocorrelation curve. 为了获得正确的结果,它需要将第一个峰定位在自相关曲线的左侧。 The method that the other developer used (calling the numpy.argmax() function) does not always find the correct value. 其他开发人员使用的方法(调用numpy.argmax()函数)并不总是找到正确的值。

I've implemented a slightly more robust version, using the peakutils package. 我已经使用peakutils包实现了一个稍微健壮的版本。 I don't promise that it's perfectly robust either, but in any case it achieves a better result than the version of the freq_from_autocorr() function that you were previously using. 我也不保证它的鲁棒性,但是无论如何,它都比以前使用的freq_from_autocorr()函数的版本更好。

My example solution is listed below: 下面列出了我的示例解决方案:

import librosa
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import fftconvolve
from pprint import pprint
import peakutils

def freq_from_autocorr(signal, fs):
    # Calculate autocorrelation (same thing as convolution, but with one input
    # reversed in time), and throw away the negative lags
    signal -= np.mean(signal)  # Remove DC offset
    corr = fftconvolve(signal, signal[::-1], mode='full')
    corr = corr[len(corr)//2:]

    # Find the first peak on the left
    i_peak = peakutils.indexes(corr, thres=0.8, min_dist=5)[0]
    i_interp = parabolic(corr, i_peak)[0]

    return fs / i_interp, corr, i_interp

def parabolic(f, x):
    """
    Quadratic interpolation for estimating the true position of an
    inter-sample maximum when nearby samples are known.

    f is a vector and x is an index for that vector.

    Returns (vx, vy), the coordinates of the vertex of a parabola that goes
    through point x and its two neighbors.

    Example:
    Defining a vector f with a local maximum at index 3 (= 6), find local
    maximum if points 2, 3, and 4 actually defined a parabola.

    In [3]: f = [2, 3, 1, 6, 4, 2, 3, 1]

    In [4]: parabolic(f, argmax(f))
    Out[4]: (3.2142857142857144, 6.1607142857142856)
    """
    xv = 1/2. * (f[x-1] - f[x+1]) / (f[x-1] - 2 * f[x] + f[x+1]) + x
    yv = f[x] - 1/4. * (f[x-1] - f[x+1]) * (xv - x)
    return (xv, yv)

# Time window after initial onset (in units of seconds)
window = 0.1

# Open the file and obtain the sampling rate
y, sr = librosa.core.load("./Vocaroo_s1A26VqpKgT0.mp3")
idx = np.arange(len(y))

# Set the window size in terms of number of samples
winsamp = int(window * sr)

# Calcualte the onset frames in the usual way
onset_frames = librosa.onset.onset_detect(y=y, sr=sr)
onstm = librosa.frames_to_time(onset_frames, sr=sr)

fqlist = [] # List of estimated frequencies, one per note
crlist = [] # List of autocorrelation arrays, one array per note
iplist = [] # List of peak interpolated peak indices, one per note
for tm in onstm:
    startidx = int(tm * sr)
    freq, corr, ip = freq_from_autocorr(y[startidx:startidx+winsamp], sr)
    fqlist.append(freq)
    crlist.append(corr)
    iplist.append(ip)    

pprint(fqlist)

# Choose which notes to plot (it's set to show all 8 notes in this case)
plidx = [0, 1, 2, 3, 4, 5, 6, 7]

# Plot amplitude curves of all notes in the plidx list 
fgwin = plt.figure(figsize=[8, 10])
fgwin.subplots_adjust(bottom=0.0, top=0.98, hspace=0.3)
axwin = []
ii = 1
for tm in onstm[plidx]:
    axwin.append(fgwin.add_subplot(len(plidx)+1, 1, ii))
    startidx = int(tm * sr)
    axwin[-1].plot(np.arange(startidx, startidx+winsamp), y[startidx:startidx+winsamp])
    ii += 1
axwin[-1].set_xlabel('Sample ID Number', fontsize=18)
fgwin.show()

# Plot autocorrelation function of all notes in the plidx list
fgcorr = plt.figure(figsize=[8,10])
fgcorr.subplots_adjust(bottom=0.0, top=0.98, hspace=0.3)
axcorr = []
ii = 1
for cr, ip in zip([crlist[ii] for ii in plidx], [iplist[ij] for ij in plidx]):
    if ii == 1:
        shax = None
    else:
        shax = axcorr[0]
    axcorr.append(fgcorr.add_subplot(len(plidx)+1, 1, ii, sharex=shax))
    axcorr[-1].plot(np.arange(500), cr[0:500])
    # Plot the location of the leftmost peak
    axcorr[-1].axvline(ip, color='r')
    ii += 1
axcorr[-1].set_xlabel('Time Lag Index (Zoomed)', fontsize=18)
fgcorr.show()

The printed output looks like: 打印输出如下:

In [1]: %run autocorr.py
[325.81996740236065,
 346.43374761017725,
 367.12435233192753,
 390.17291696559079,
 412.9358117076161,
 436.04054933498134,
 465.38986619237039,
 490.34120132405866]

The first figure produced by my code sample depicts the amplitude curves for the next 0.1 seconds following each detected onset time: 我的代码示例产生的第一幅图描绘了每个检测到的开始时间之后的0.1秒的幅度曲线:

吉他音符振幅

The second figure produced by the code shows the autocorrelation curves, as computed inside of the freq_from_autocorr() function. 该代码产生的第二个图显示了自相关曲线,该自相关曲线是在freq_from_autocorr()函数内部计算的。 The vertical red lines depict the location of the first peak on the left for each curve, as estimated by the peakutils package. 垂直的红线描绘了每条曲线左边第一个峰的位置,由peakutils软件包估算。 The method used by the other developer was getting incorrect results for some of these red lines; 其他开发人员使用的方法对于其中一些红线获得了不正确的结果; that's why his version of that function was occasionally returning the wrong frequencies. 这就是为什么他的函数版本偶尔返回错误的频率的原因。

吉他音符自相关曲线

My suggestion would be to test the revised version of the freq_from_autocorr() function on other recordings, see if you can find more challenging examples where even the improved version still gives incorrect results, and then get creative and try to develop an even more robust peak finding algorithm that never, ever mis-fires. 我的建议是在其他唱片上测试freq_from_autocorr()函数的修订版,看看是否可以找到更具挑战性的示例,即使改进后的版本仍会给出不正确的结果,然后发挥创造力并尝试开发出更可靠的峰值寻找永远不会错火的算法。

The autocorrelation method is not always right. 自相关方法并不总是正确的。 You may want to implement a more sophisticated method like YIN: 您可能想要实现一种更复杂的方法,例如YIN:

http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf

or MPM: 或MPM:

http://www.cs.otago.ac.nz/tartini/papers/A_Smarter_Way_to_Find_Pitch.pdf http://www.cs.otago.ac.nz/tartini/papers/A_Smarter_Way_to_Find_Pitch.pdf

Both of the above papers are good reads. 以上两篇论文都是不错的读物。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM