简体   繁体   English

比较录音

[英]Comparing audio recordings

I have 5 recorded wav files.我有 5 个录制的 wav 文件。 I want to compare the new incoming recordings with these files and determine which one it resembles most.我想将新传入的录音与这些文件进行比较,并确定它最像哪一个。

In the final product I need to implement it in C++ on Linux, but now I am experimenting in Matlab.在最终产品中,我需要在 Linux 上的 C++ 中实现它,但现在我正在 Matlab 中进行实验。 I can see FFT plots very easily.我可以很容易地看到 FFT 图。 But I don't know how to compare them.但我不知道如何比较它们。

How can I compute the similarity of two FFT plots?如何计算两个 FFT 图的相似度?

Edit: There is only speech in the recordings.编辑:录音中只有语音。 Actually, I am trying to identify the response of answering machines of a few telecom companies.实际上,我正在尝试确定一些电信公司的应答机的响应。 It's enough to distinguish two messages "this person can not be reached at the moment" and "this number is not used anymore"区分“暂时无法联系到此人”和“此号码不再使用”两条消息就足够了

This depends a lot on your definition of "resembles most".这在很大程度上取决于您对“最相似”的定义。 Depending on your use case this can be a lot of things.根据您的用例,这可能是很多事情。 If you just want to compare the bare spectra of the whole file you can just correlate the values returned by the two ffts.如果您只想比较整个文件的裸光谱,您可以关联两个 fft 返回的值。

However spectra tend to change a lot when the files get warped in time.然而,当文件及时扭曲时,光谱往往会发生很大变化。 To figure out the difference with this, you need to do a windowed fft and compare the spectra for each window.要找出与此的区别,您需要进行窗口 fft 并比较每个 window 的光谱。 This then defines your difference function you can use in a Dynamic time warping algorithm .然后,这定义了您可以在Dynamic time warping algorithm中使用的差异 function 。

If you need perceptual resemblance an FFT probably does not get you what you need.如果您需要感知相似性,FFT 可能无法满足您的需求。 AnMFCC of the recordings is most likely much closer to this problem.录音的MFCC很可能更接近这个问题。 Again, you might need to calculate windowed MFCCs instead of MFCCs of the whole recording.同样,您可能需要计算窗口化的 MFCC,而不是整个录音的 MFCC。

If you have musical recordings again you need completely different aproaches.如果你有音乐录音,你需要完全不同的方法。 There is a blog posting that describes how Shazam works, so you might be able to find this on google.有一篇博客文章描述了 Shazam 的工作原理,因此您可以在 google 上找到它。 Or if you want real musical similarity have a look at this book或者如果你想要真正的音乐相似度,看看这本书

EDIT :编辑

The best solution for the problem specified above would be the one described here ("shazam algorithm" as mentioned above).This is however a bit complicated to implement and easier solution might do well enough.上述问题的最佳解决方案将是此处描述的解决方案(如上所述的“shazam 算法”)。然而,实现起来有点复杂,更简单的解决方案可能会做得很好。

If you know that there are only 5 different different possible incoming files, I would suggest trying first something as easy as doing the euclidian distance between the two signals (in temporal or fourier).如果您知道只有 5 种不同的不同可能传入文件,我建议首先尝试一些简单的方法,例如计算两个信号之间的欧几里德距离(时间或傅立叶)。 It is likely to give you good result.它很可能会给你带来好的结果。

Edit: So with different possible starts, try doing an autocorrelation and see which file has the higher peak.编辑:所以有不同的可能开始,尝试做一个自相关,看看哪个文件有更高的峰值。

I suggest you compute simple sound parameter like fundamental frequency.我建议您计算简单的声音参数,例如基频。 There are several methods of getting this value - I tried autocorrelation and cepstrum and for voice signals they worked fine.有几种方法可以获得这个值——我尝试了自相关和倒谱,对于语音信号,它们工作得很好。 With such function working you can make time-analysis and compare two signals (base - to which you compare, in - which you would like to match) on given interval frequency.通过这样的 function 工作,您可以在给定的间隔频率上进行时间分析并比较两个信号(基础 - 您比较的,在 - 您想要匹配的)。 Comparing several intervals based on such criteria can tell you which base sample matches the best.根据这些标准比较几个区间可以告诉您哪个基本样本最匹配。

Of course everything depends on what you mean resembles most .当然一切都取决于你的意思是最相似的。 To compare function you can introduce other parameters like volume, noise, clicks, pitches...要比较 function,您可以引入其他参数,如音量、噪声、咔嗒声、音高...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM