简体繁体 English

检查录音质量的有效方法

[英]Efficient method for checking quality of a sound recording

原文 2013-08-14 21:09:01 8 2 c#/ speech-recognition/ audio-fingerprinting

We have various wave files from live uncontrolled recordings that come in from one of our server side processes and most of them have good clear speech throughout.我们有来自我们的一个服务器端进程的实时不受控制录音的各种波形文件，其中大部分都有很好的清晰语音。 However, sometimes they are garbled, they have static, or the speech volume isn't loud enough.但是，有时会出现乱码，有static，或者语音音量不够大。 Is there an efficient method for determining if a recording is deemed "good" quality using C#?使用 C# 是否有一种有效的方法来确定录音是否被视为“良好”质量？

I thought about taking the spectogram of a known good recording and comparing to the spectogram of a bad recording but the recordings will have different speech each time so this might not work.我考虑过获取已知良好录音的频谱图并与不良录音的频谱图进行比较，但录音每次都会有不同的语音，因此这可能行不通。 I've looked into libraries like Bass.Net and NAudio, but audio processing is not my field of expertise.我研究过 Bass.Net 和 NAudio 等库，但音频处理不是我的专业领域。

I could try comparing audio fingerprints , but I'm not entirely sure how this works.我可以尝试比较音频指纹，但我不完全确定这是如何工作的。 I saw that someone was attempting to compare two audio files using their audio fingerprint hashes and the Levenshtein Distance algorithm to find the degree of similarity between the two audio files.我看到有人试图使用他们的音频指纹哈希和 Levenshtein 距离算法来比较两个音频文件，以找出两个音频文件之间的相似程度。 Unless the hashes produced by audio fingerprinting are similar between similar audio files, this method won't work.除非音频指纹识别生成的哈希值在相似的音频文件之间相似，否则此方法将不起作用。

Another thought of mine was to use some sort of speech recognition API for attempting to process speech and write a transcript of the audio to a text file.我的另一个想法是使用某种语音识别 API 来尝试处理语音并将音频的抄本写入文本文件。 The problem is that speech recognition isn't extremely accurate and APIs like Microsoft's Speech API may still try to recognize speech even in a garbled recording or one with a bunch of static. I saw that Nuance has an SDK version of their speech recognition software, but I haven't had a chance to look at the SDK yet since they don't seem to offer a trial version of the SDK on their website.问题是语音识别不是非常准确，像 Microsoft 的 Speech API 这样的 API 可能仍然会尝试识别语音，即使是在乱码录音或带有一堆 static 的录音中。我看到 Nuance 有一个SDK版本的语音识别软件，但我还没有机会查看 SDK，因为他们似乎没有在其网站上提供 SDK 的试用版。

2 个解决方案

You can use existing open source tools to measure SNR for noisy speech. 您可以使用现有的开源工具来测量嘈杂语音的SNR。 For details see http://labrosa.ee.columbia.edu/projects/snreval/ 有关详细信息，请参见http://labrosa.ee.columbia.edu/projects/snreval/

I recommend you to try WADA SNR 我建议您尝试WADA SNR

http://www.cs.cmu.edu/~robust/archive/algorithms/WADA_SNR_IS_2008/ http://www.cs.cmu.edu/~robust/archive/algorithms/WADA_SNR_IS_2008/

It's pretty simple algorithm but it's not trivial to design it by yourself. 这是一个非常简单的算法，但您自己设计它并非易事。

Fingerprinting and ASR doesn't work for sure since they try to eliminate noise not to detect it. 指纹识别和ASR肯定无法正常工作，因为它们试图消除无法检测到的噪音。

I am also searching for a solution for a similar problem and I found this open source project: https://github.com/dpwe/audfprint .我也在寻找类似问题的解决方案，我发现了这个开源项目： https://github.com/dpwe/audfprint 。 You can create a database and then compare your query(the audio the quality of you're not sure) against the database.您可以创建一个数据库，然后将您的查询（您不确定的音频质量）与数据库进行比较。