简体   繁体   English

可以使用 abio 来检测只有节奏的片段吗?

[英]Can aubio be used to detect rhythm-only segments?

Does aubio have a way to detect sections of a piece of audio that lack tonal elements -- rhythm only? aubio 是否有办法检测一段音频中缺少音调元素的部分——只有节奏? I tested a piece of music that has 16 seconds of rhythm at the start, but all the aubiopitch and aubionotes algorithms seemed to detect tonality during the rhythmic section.我测试了一首开始时有 16 秒节奏的音乐,但所有的 aubiopitch 和 aubionotes 算法似乎都能在节奏部分检测到音调。 Could it be tuned somehow to distinguish tonal from non-tonal onsets?能否以某种方式对其进行调整以区分音调和非音调起始? Or is there a related library that can do this?或者有没有相关的图书馆可以做到这一点?

Been busy the past couple of days - but started looking into this today...过去几天一直很忙-但今天开始研究这个...

It'll take a while to perfect I guess but I thought I'd give you a few thoughts and some code I've started working on to attack this!我想这需要一段时间才能完善,但我想我会给你一些想法和一些我已经开始着手攻击这个问题的代码!

Firstly, pseudo code's a good way to design an initial method.首先,伪代码是设计初始方法的好方法。

1/ use import matplotlib.pyplot as plt to spectrum analyse the audio, and plot various fft and audio signals. 1/使用import matplotlib.pyplot as plt对音频进行频谱分析,plot各种fft和音频信号。

2/ import numpy as np for basic array-like structure handling. 2/ import numpy as np用于基本的类似数组的结构处理。

(I know this is more than pseudo code, but hey:-) (我知道这不仅仅是伪代码,但是嘿:-)

3/ plt.specgram creates spectral maps of your audio. 3/ plt.specgram创建音频的频谱图。 Apart from the image it creates (which can be used to start to manually deconstruct your audio file), it returns 4 structures.除了它创建的图像(可用于开始手动解构您的音频文件)之外,它还返回 4 个结构。

eg例如

ffts,freqs,times,img = plt.specgram(signal,Fs=44100)

ffts is a 2 dimentional array where the columns are the ffts (Fast Fourier Transforms) of the time sections (rows). ffts是一个二维数组,其中列是时间段(行)的ffts (快速傅立叶变换)。

The plain vanilla specgram analyses time sections of 256 samples long, stepping 128 samples forwards each time.普通的香草specgram分析了 256 个样本的时间段,每次向前步进 128 个样本。

This gives a very low resolution frequency array at a pretty fast rate.这以相当快的速度提供了一个非常低分辨率的频率阵列。

As musical notes merge into a single sound when played at more or less 10 hz, I decided to use the specgram options to divide the audio into 4096 sample lengths (circa 10 hz) stepping forwards every 2048 samples (ie 20 times a second).当以或多或少的 10 赫兹播放音符时,音符会合并成一个声音,我决定使用specgram选项将音频分成 4096 个样本长度(大约 10 赫兹),每 2048 个样本向前步进(即每秒 20 次)。

This gives a decent frequency resolution, and the time sections being 20th sec apart are faster than people can perceive individual notes.这提供了不错的频率分辨率,相隔 20 秒的时间段比人们感知单个音符的速度要快。

This means calling the specgram as follows:这意味着调用specgram如下:

plt.specgram(signal,Fs=44100,NFFT=4096,noverlap=2048,mode='magnitude')

(Note the mode - this seems to give me amplitudes of between 0 - 0.1: I have a problem with fft not giving me amplitudes of the same scale as the audio signal (you may have seen the question I posted). But here we are... (注意模式 - 这似乎给了我 0 - 0.1 之间的振幅:我有一个问题, fft没有给我与音频信号相同比例的振幅(你可能已经看到我发布的问题)。但我们在这里...

4/ Next I decided to get rid of noise in the ffts returned. 4/ 接下来我决定消除返回的ffts中的噪音。 This means we can concentrate on freqs of a decent amplitude, and zero out the noise which is always present in ffts (in my experience).这意味着我们可以专注于适当幅度的频率,并将freqs中始终存在的噪声ffts (根据我的经验)。

Here is (are) my function(s):这是(是)我的功能:

def gate(signal,minAmplitude):
    return np.array([int((((a-minAmplitude)+abs(a-minAmplitude))/2) > 0) * a for a in signal])

Looks a bit crazy - and I'm sure a proper mathematician could come up with something more efficient - but this is the best I could invent.看起来有点疯狂——我相信一个合适的数学家可以想出更有效的东西——但这是我能发明的最好的东西。 It zeros any freqencies of amplitude less than minAmplitude .它将幅度小于minAmplitude的任何频率归零。

This is the relevant code to call it from the ffts returned by plt.specgram as follows, my function is more involved as it is part of a class, and has other functions it references - but this should be enough:这是从ffts返回的plt.specgram调用它的相关代码,如下所示,我的 function 涉及更多,因为它是 class 的一部分,并且具有它引用的其他函数 - 但这应该足够了:

def fft_noise_gate(minAmplitude=0.001,check=True):
    '''
    zero the amplitudes of frequencies 
    with amplitudes below minAmplitude 
    across self.ffts
    check - plot middle fft just because!
    '''       
    nffts = ffts.shape[1]
    gated_ffts = []
    for f in range(nffts):
        fft = ffts[...,f]
        # Anyone got a more efficient noise gate formula? Best I could think up!
        fft_gated = gate(fft,minAmplitude)
        gated_ffts.append(fft_gated)
    ffts = np.array(gated_ffts)
    if check:
        # plot middle fft just to see!
        plt.plot(ffts[int(nffts/2)])
        plt.show(block=False)
    return ffts

This should give you a start I'm still working on it and will get back to you when I've got further - but if you have any ideas, please share them.这应该给你一个开始我仍在努力,当我有进一步的时候会回复你 - 但如果你有任何想法,请分享它们。

Any way my strategy from here is to:无论如何,我的策略是:

1/ find the peaks ( ie start of any sounds) then 2/ Look for ranges of frequencies which rise and fall in unison (ie make up a sound). 1/ 找到峰值(即任何声音的开始),然后 2/ 寻找同时上升和下降的频率范围(即组成一个声音)。

And

3/ Differentiate them into individual instruments (sound sources more specifically), and plot the times and amplitudes thereof to create your analysis (score). 3/ 将它们区分为单独的乐器(更具体地说是声源),并对其时间和幅度进行分析(乐谱)。

Hope you're having fun with it - I know I am.希望你玩得开心——我知道我是。

As I said any thoughts...正如我所说的任何想法......

Regards问候

Tony托尼

Use a spectrum analyser to detect sections with high amplitude.使用频谱分析仪检测具有高幅度的部分。 If you program - you could take each section and make an average of the freqencies (and amplitudes) present to give you an idea of the instrument(s) involved in creating that amplitude peak.如果您进行编程 - 您可以获取每个部分并对存在的频率(和幅度)进行平均,以让您了解创建该幅度峰值所涉及的仪器。

Hope that helps - if you're using python I could give you some pointers how to program this??希望对您有所帮助-如果您使用的是 python 我可以给您一些指导如何编程?

Regards问候

Tony托尼

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM