简体繁体 English

从录音中确定音质？

[英]Determining sound quality from an audio recording?

原文 2013-06-26 14:37:36 1 3 python/ audio/ noise

Is there any way to algorithmically determine audio quality from a .wav or .mp3 file?有没有办法通过算法确定 .wav 或 .mp3 文件的音频质量？

Basically I have users with diverse recording setups (ie they are from all over the world and I have no control over them) recording audio to mp3/wav files.基本上，我有使用不同录音设置的用户（即他们来自世界各地，我无法控制他们）将音频录制到 mp3/wav 文件。 At which point the software should determine whether their setup is okay or not (tragically, for some reason they are not capable of making this determination just by listening to their own recordings, and so occasionally we get recordings that are basically impossible to understand due to low volume or high noise).在这一点上，软件应该确定他们的设置是否正常（可悲的是，由于某种原因，他们无法仅通过听自己的录音来做出此决定，因此有时我们会收到由于以下原因而基本上无法理解的录音低音量或高噪音）。

I was doing a volume check to make sure the microphone level was okay;我正在做音量检查以确保麦克风电平正常； unfortunately this misses cases where the volume is high but the clarity is low.不幸的是，这错过了音量高但清晰度低的情况。 I'm wondering if there is some kind of standard scan I can do (ideally in Python) that detects when there is a lot of background noise.我想知道是否有某种标准扫描我可以做（最好是在 Python 中）来检测何时有很多背景噪音。

I realize one possible solution is to ask them to record total silence and then compare to the spoken recording and consider the audio "bad" if the volume of the "silent" recording is too close to the volume of the spoken recording.我意识到一种可能的解决方案是让他们录制完全静音，然后与语音录音进行比较，如果“静音”录音的音量太接近语音录音的音量，则认为音频“不好”。 But that depends on getting a good sample from the speaker both times, which may or may not be something I can depend on.但这取决于两次都从演讲者那里得到一个好的样本，这可能是也可能不是我可以依赖的东西。

So I'm wondering if instead there's just a way to scan through an audio file (these would be ~10 seconds long) and recognize whether the sound file is "noisy" or clear.所以我想知道是否只有一种方法可以扫描音频文件（这些文件时长约 10 秒）并识别声音文件是“嘈杂”还是清晰。

3 个解决方案

I am building an API that aims to detect various kinds of bad audio.我正在构建一个旨在检测各种不良音频的 API。 You can use this API to compute an overall score and also give specific recommendations to people on how to improve their sound quality.您可以使用此 API 来计算总分，还可以向人们提供有关如何提高音质的具体建议。 Have a look:看一看：
https://www.tinydrop.tech/documentation/#loudness-detection https://www.tinydrop.tech/documentation/#loudness-detection

It all depends on what your quality problems are, which is not 100% clear from your question, but here are some suggestions:这完全取决于您的质量问题是什么，从您的问题中并不是 100% 清楚，但这里有一些建议：

In the case where volume is high and clarity is low, I'm guessing the problem is that the user has the input gain too high.在音量高而清晰度低的情况下，我猜测问题在于用户的输入增益太高。 After the recording, you can simply check for distortion.录制后，您可以简单地检查失真。 Even better, you can use Automatic Gain Control (AGC) durring recording to prevent this from happening in the first place.更好的是，您可以在录音期间使用自动增益控制 (AGC) 来首先防止这种情况发生。

In the case of too much noise, I'm assuming the issue is that the speaker is too far from the mike.在噪音太大的情况下，我假设问题是扬声器离麦克风太远。 In this case Steve's suggestion might work, but to make it really work, you'd need to do a ton of work comparing sample recordings and developing statistics to see how you can discriminate.在这种情况下，史蒂夫的建议可能会奏效，但要使其真正奏效，您需要做大量的工作来比较样本录音和开发统计数据，以了解如何区分。 In practice, I think this is too much work.在实践中，我认为这是太多的工作。 A simpler alternative that I think will be easier and more likely to work (although not necessarily guaranteed) would be to create an envelope of your signal, then create a histogram from that and see how the histogram compares to existing good and bad recordings.我认为更简单且更有可能起作用（虽然不一定保证）的更简单的替代方法是创建信号的包络，然后从中创建直方图，然后查看直方图与现有的好坏记录进行比较。 If we are talking about speech only, you could divide the signal into three frequency bands (with a time-domain filter, not an FFT) to give you an idea of how much is noise (the high and low bands) and how much is sound you care about (the center band).如果我们只讨论语音，您可以将信号分为三个频段（使用时域滤波器，而不是 FFT），让您了解噪声（高频段和低频段）有多少以及噪声有多少您关心的声音（中心带）。

Again, though, I would use an AGC durring recording and if the AGC finds it needs to set the input gain too high, it's probably a bad recording.不过，我还是会在录音期间使用 AGC，如果 AGC 发现它需要将输入增益设置得太高，这可能是一个糟糕的录音。

Not quite my field but I suspect that if you get a spectrum, (do a Fourier transform maybe), and compare "good" and "noisy" recordings you will find that the noise contributes to a cross spectrum level that is higher in the bad recordings than the good.不完全是我的领域，但我怀疑如果你得到一个频谱，（可能做一个傅立叶变换），并比较“好”和“嘈杂”的录音，你会发现噪音有助于交叉频谱水平，在坏的情况下会更高录音比好。 Take a look at the signal processing section in SciPy - this can probably help.看看 SciPy 中的信号处理部分 - 这可能会有所帮助。