简体繁体 English

如何尝试查找播放相同歌曲但压缩格式不同的音频文件？

[英]How do I proceed in an attempt to find audio files which play the same song but are in different compressed formats?

原文 2015-02-07 08:14:04 7 1 java/ python/ ffmpeg/ hidden-markov-models/ fuzzy-logic

all i want is suppose i have same song named as song.mp3 and song.aac now i want my program to identify that they are same, i know this is non-trivail task to do. 我只想假设我有一首名为song.mp3和song.aac的歌曲，现在我想让我的程序确定它们是相同的，我知道这是一项非繁琐的任务。

so far i have tried fingerprinting audio using dejavu python library which produces 2 different fingerprints for our case song.mp3 and song.aac, hence it doesnt suit need of my program. 到目前为止，我已经尝试使用dejavu python库对音频进行指纹识别，该库会为我们的案例song.mp3和song.aac生成2种不同的指纹，因此不适合我的程序。

I also tried MD5 using FFMPEG but as expected it gives different hash for even same songs downloaded from different websites 我还尝试了使用FFMPEG的MD5，但正如预期的那样，即使从不同网站下载的同一首歌曲，它也会产生不同的哈希

Do you guys have any idea how do I proceed? 你们知道我该如何进行吗？ It would be even great to provide me step wise procedure and library to achieve my goal. 为我提供逐步的过程和库来实现我的目标，甚至会很棒。 thank you 谢谢

1 个解决方案

Audio fingerprinting is incredibly complex, and difficult to get right. 音频指纹识别非常复杂，并且很难正确设置。 You do not really want to come up with your own algorithm just like that, because it likely is much worse than established methods (being better than established methods requires doing some research ;-)). 您并不是真的想像那样提出自己的算法，因为它可能比已建立的方法差很多（要比已建立的方法好，需要做一些研究；-)。

One of the open source solutions for audio fingerprinting which I found is http://echoprint.me/codegen 我发现的一种用于音频指纹识别的开源解决方案之一是http://echoprint.me/codegen

You can use that in your application, either by calling directly into the libcodegen API, or by spawning subprocesses for audio analysis. 您可以在应用程序中使用它，方法是直接调用libcodegen API，或者通过生成子进程进行音频分析。