简体   繁体   English

使用 Python 进行声音检测

[英]Sound detection using Python

To perform an end-to-end test of an embedded platform that plays musical notes, we are trying to record via a microphone and identify whether a specific sound were played using the device' speakers.为了对播放音符的嵌入式平台执行端到端测试,我们尝试通过麦克风进行录音,并确定是否使用设备的扬声器播放了特定的声音。 The testing setup is not a real-time system so we don't really know when (or even if) the expected sound begins and ends.测试设置不是实时系统,所以我们真的不知道预期声音何时(或什至)何时开始和结束。

The expected sound is represented in a wave file (or similar) we can read from disk.预期的声音以我们可以从磁盘读取的波形文件(或类似文件)表示。

How can we run a test that asserts whether the sound were played as expected?我们如何运行一个测试来断言声音是否按预期播放?

There are a few ways to tackle this problem:有几种方法可以解决这个问题:

  1. Convert the expected sound into a sequence of frequency amplitude将预期的声音转换为频率幅度序列
    pairs.对。 Then, record the sound via the microphone and convert that然后,通过麦克风录制声音并将其转换为
    recording into a corresponding sequence of frequency amplitude pairs.记录成相应的频率幅度对序列。 Finally, compare the two sequences to see if they match.最后,比较两个序列,看看它们是否匹配。

    1. This task can be accomplished using the modules scipy, numpy, and matplotlib.此任务可以使用模块 scipy、numpy 和 matplotlib 来完成。

    2. We'll need to generate a sequence of frequency amplitude pairs for the expected sound.我们需要为预期的声音生成一系列频率幅度对。 We can do this by using the scipy.io.wavfile.read() function to read in a wave file containing the expected sound.我们可以通过使用 scipy.io.wavfile.read() function 读取包含预期声音的波形文件来做到这一点。 This function will return a tuple containing the sample rate (in samples per second) and a numpy array containing the amplitudes of the waveform.这个 function 将返回一个包含采样率(以每秒采样数为单位)的元组和一个包含波形幅度的 numpy 数组。 We can then use the numpy.fft.fft() function to convert the amplitudes into a sequence of frequency amplitude pairs.然后我们可以使用 numpy.fft.fft() function 将幅度转换为一系列频率幅度对。

    3. We'll need to record the sound via the microphone.我们需要通过麦克风录制声音。 For this, we'll use the pyaudio module.为此,我们将使用 pyaudio 模块。 We can create a PyAudio object using the pyaudio.PyAudio() constructor, and then use the open() method to open a stream on the microphone.我们可以使用 pyaudio.PyAudio() 构造函数创建一个 PyAudio object,然后使用 open() 方法在麦克风上打开一个 stream。 We can then read in blocks of data from the stream using the read() method.然后我们可以使用 read() 方法从 stream 中读取数据块。 Each block of data will be a numpy array containing the amplitudes of the waveform at that particular moment in time.每个数据块将是一个 numpy 阵列,其中包含该特定时刻的波形幅度。 We can then use the numpy.fft.fft() function to convert the amplitudes into a sequence of frequency amplitude pairs.然后我们可以使用 numpy.fft.fft() function 将幅度转换为一系列频率幅度对。

    4. Finally, we can compare the two sequences of frequency amplitude pairs to see if they match.最后,我们可以比较两个频率幅度对序列,看看它们是否匹配。 If they do match, then we can conclude that the expected sound was recorded correctly.如果它们确实匹配,那么我们可以得出结论,预期的声音被正确记录了。 If they don't match, then we can conclude that the expected sound was not recorded correctly.如果它们不匹配,那么我们可以得出结论,预期的声音没有被正确记录。

  2. Use a sound recognition system to identify the expected sound in the recording.使用声音识别系统来识别录音中的预期声音。

from pydub import AudioSegment
from pydub.silence import split_on_silence, detect_nonsilent
from pydub.playback import play

def get_sound_from_recording():
    sound = AudioSegment.from_wav("recording.wav") # detect silent chunks and split recording on them
    chunks = split_on_silence(sound,  min_silence_len=1000,  keep_silence=200) # split on silences longer than 1000ms. Anything under -16 dBFS is considered silence. keep 200ms of silence at the beginning and end
    for i, chunk in enumerate(chunks):
        play(chunk)
        return chunks
  1. Cross-correlate the recording with the expected sound.将录音与预期的声音交叉关联。 This will produce a sequence of values that indicates how closely the recording matches the expected sound.这将产生一系列值,指示录音与预期声音的匹配程度。 A high value at a particular time index indicates that the recording and expected sound match closely at that time.特定时间指数的高值表示当时的录音和预期的声音非常匹配。
# read in the wav file and get the sampling rate
sampling_freq, audio = wavfile.read('audio_file.wav')

# read in the reference image file
reference = plt.imread('reference_image.png')

# cross correlate the image and the audio signal
corr = signal.correlate2d(audio, reference)

# plot the cross correlation signal
plt.plot(corr)
plt.show()

This way you can set up your test to check if you are getting the correct output.通过这种方式,您可以设置测试以检查您是否获得了正确的 output。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM