简体   繁体   English

如何在时域计算音高基频 f( 0) )?

[英]How to calculate pitch fundamental frequency f( 0) ) in time domain?

I am new in DSP, trying to calculate fundamental frequency ( f(0) ) for each segmented frame of the audio file.我是 DSP 的新手,试图为音频文件的每个分段帧计算基频( f(0) )。 The methods of F0 estimation can be divided into three categories: F0估计的方法可以分为三类:

  • based on temporal dynamics of the signal time-domain;基于信号时域的时间动态;
  • based on the frequency structure frequency-domain, and基于频率结构频域,以及
  • hybrid methods.混合方法。

Most of the examples are estimating fundamental frequency based on the frequency structure frequency-domain, I am looking for based on temporal dynamics of the signal time-domain.大多数示例都是基于频率结构频域估计基频,我正在寻找基于信号时域的时间动态。

This article provides some information but I am still not clear how to calculate it in the time domain? 本文提供了一些信息,但我仍然不清楚如何在时域中计算它?

https://gist.github.com/endolith/255291 https://gist.github.com/endolith/255291

This is the code, I have found, used so far:这是我发现的到目前为止使用的代码:

 def freq_from_autocorr(sig, fs): """ Estimate frequency using autocorrelation """ # Calculate autocorrelation and throw away the negative lags corr = correlate(sig, sig, mode='full') corr = corr[len(corr)//2:] # Find the first low point d = diff(corr) start = nonzero(d > 0)[0][0] # Find the next peak after the low point (other than 0 lag). This bit is # not reliable for long signals, due to the desired peak occurring between # samples, and other peaks appearing higher. # Should use a weighting function to de-emphasize the peaks at longer lags. peak = argmax(corr[start:]) + start px, py = parabolic(corr, peak) return fs / px

How to estimate in time domain?如何在时域进行估计?

Thanks in advance!提前致谢!

It is a correct implementation.这是一个正确的实现。 Not very robust, but certainly working.不是很健壮,但肯定有效。 To verify this, we can generate a signal of known frequency and see what result we're going to get:为了验证这一点,我们可以生成一个已知频率的信号,看看我们会得到什么结果:

import numpy as np
from scipy.io import wavfile
from scipy.signal import correlate, fftconvolve
from scipy.interpolate import interp1d

fs = 44100
frequency = 440
length = 0.01 # in seconds

t = np.linspace(0, length, int(fs * length)) 
y = np.sin(frequency * 2 * np.pi * t)

def parabolic(f, x):
    xv = 1/2. * (f[x-1] - f[x+1]) / (f[x-1] - 2 * f[x] + f[x+1]) + x
    yv = f[x] - 1/4. * (f[x-1] - f[x+1]) * (xv - x)
    return (xv, yv)

def freq_from_autocorr(sig, fs):
    """
    Estimate frequency using autocorrelation
    """
    corr = correlate(sig, sig, mode='full')
    corr = corr[len(corr)//2:]
    d = np.diff(corr)
    start = np.nonzero(d > 0)[0][0]
    peak = np.argmax(corr[start:]) + start
    px, py = parabolic(corr, peak)

    return fs / px

Result结果

Running freq_from_autocorr(y, fs) gets us ~442.014 Hz , roughly 0.45% error.运行freq_from_autocorr(y, fs)得到~442.014 Hz ,大约 0.45% 的误差。

Bonus - we can improve奖金 - 我们可以改进

We can make it more precise and robust with slightly more coding:我们可以通过稍微多一点的编码使它更精确和健壮:

def indexes(y, thres=0.3, min_dist=1, thres_abs=False):
    """Peak detection routine borrowed from 
    https://bitbucket.org/lucashnegri/peakutils/src/master/peakutils/peak.py
    """
    if isinstance(y, np.ndarray) and np.issubdtype(y.dtype, np.unsignedinteger):
        raise ValueError("y must be signed")

    if not thres_abs:
        thres = thres * (np.max(y) - np.min(y)) + np.min(y)

    min_dist = int(min_dist)

    # compute first order difference
    dy = np.diff(y)

    # propagate left and right values successively to fill all plateau pixels (0-value)
    zeros, = np.where(dy == 0)

    # check if the signal is totally flat
    if len(zeros) == len(y) - 1:
        return np.array([])

    if len(zeros):
        # compute first order difference of zero indexes
        zeros_diff = np.diff(zeros)
        # check when zeros are not chained together
        zeros_diff_not_one, = np.add(np.where(zeros_diff != 1), 1)
        # make an array of the chained zero indexes
        zero_plateaus = np.split(zeros, zeros_diff_not_one)

        # fix if leftmost value in dy is zero
        if zero_plateaus[0][0] == 0:
            dy[zero_plateaus[0]] = dy[zero_plateaus[0][-1] + 1]
            zero_plateaus.pop(0)

        # fix if rightmost value of dy is zero
        if len(zero_plateaus) and zero_plateaus[-1][-1] == len(dy) - 1:
            dy[zero_plateaus[-1]] = dy[zero_plateaus[-1][0] - 1]
            zero_plateaus.pop(-1)

        # for each chain of zero indexes
        for plateau in zero_plateaus:
            median = np.median(plateau)
            # set leftmost values to leftmost non zero values
            dy[plateau[plateau < median]] = dy[plateau[0] - 1]
            # set rightmost and middle values to rightmost non zero values
            dy[plateau[plateau >= median]] = dy[plateau[-1] + 1]

    # find the peaks by using the first order difference
    peaks = np.where(
        (np.hstack([dy, 0.0]) < 0.0)
        & (np.hstack([0.0, dy]) > 0.0)
        & (np.greater(y, thres))
    )[0]

    # handle multiple peaks, respecting the minimum distance
    if peaks.size > 1 and min_dist > 1:
        highest = peaks[np.argsort(y[peaks])][::-1]
        rem = np.ones(y.size, dtype=bool)
        rem[peaks] = False

        for peak in highest:
            if not rem[peak]:
                sl = slice(max(0, peak - min_dist), peak + min_dist + 1)
                rem[sl] = True
                rem[peak] = False

        peaks = np.arange(y.size)[~rem]

    return peaks

def freq_from_autocorr_improved(signal, fs):
    signal -= np.mean(signal)  # Remove DC offset
    corr = fftconvolve(signal, signal[::-1], mode='full')
    corr = corr[len(corr)//2:]

    # Find the first peak on the left
    i_peak = indexes(corr, thres=0.8, min_dist=5)[0]
    i_interp = parabolic(corr, i_peak)[0]

    return fs / i_interp, corr, i_interp

Running freq_from_autocorr_improved(y, fs) yields ~441.825 Hz , roughly 0.41% error.运行freq_from_autocorr_improved(y, fs)产生~441.825 Hz ,大约 0.41% 的误差。 This method will perform better for more complex cases and takes up twice longer to compute.这种方法在更复杂的情况下会表现得更好,并且需要两倍的时间来计算。

By sampling longer (ie setting length to eg 0.1s) we will obtain more accurate results.通过更长的采样时间(即将length设置为例如 0.1s),我们将获得更准确的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM