得到播放wav音频级别作为输出

Question

当播放的wav文件发出声音时，我想制作一个可以移动或发光的说话口。 因此，我需要检测wav文件何时发言或何时在单词之间保持沉默。 目前我正在使用我找到的pygame脚本

import pygame
pygame.mixer.init()
pygame.mixer.music.load("my_sentence.wav")
pygame.mixer.music.play()
while pygame.mixer.music.get_busy() == True:
    continue

我想我可以在while循环中进行一些检查以查看声音输出级别，或类似的东西，然后将其发送到其中一个gpio输出。 但我不知道如何实现这一目标。

任何帮助将非常感激

Answer 1

当语音存在时，您需要检查WAV文件。 最简单的方法是寻找大声和安静的时期。 因为声音适用于波浪，当它静音时，波形文件中的值不会发生太大变化，而当声音很大时它们会发生很大的变化。

估计响度的一种方法是方差。 正如你可以看到的文章，这可以定义为E[(X - mu)^2] ，它可以写成average((X - average(X))^2) 。 这里，X是给定点处的信号值（存储在WAV文件中的值，在代码中称为sample ）。 如果它发生了很大变化，那么方差就会很大。

这可以让你计算整个文件的响度。 但是，您希望在任何给定时间跟踪文件的大小，这意味着您需要一种移动平均值。 一个简单的方法是使用一阶低通滤波器。

我没有测试下面的代码，所以它不太可能工作，但它应该让你开始。 它加载WAV文件，使用低通滤波器来跟踪均值和方差，并在方差高于和低于某个阈值时计算出来。 然后，在播放WAV文件时，它会跟踪自开始播放以来的时间，并打印出WAV文件是否响亮或安静。

以下是您可能仍需要做的事情：

修复我在代码中的所有故意错误
添加一些有用的东西来响应大声/安静的变化
更改阈值和reaction_time以获得良好的音频效果
添加一些滞后（可变阈值）以阻止灯光闪烁

我希望这有帮助！

import wave
import struct
import time

def get_loud_times(wav_path, threshold=10000, time_constant=0.1):
    '''Work out which parts of a WAV file are loud.
        - threshold: the variance threshold that is considered loud
        - time_constant: the approximate reaction time in seconds'''

    wav = wave.open(wav_path, 'r')
    length = wav.getnframes()
    samplerate = wav.getframerate()

    assert wav.getnchannels() == 1, 'wav must be mono'
    assert wav.getsampwidth() == 2, 'wav must be 16-bit'

    # Our result will be a list of (time, is_loud) giving the times when
    # when the audio switches from loud to quiet and back.
    is_loud = False
    result = [(0., is_loud)]

    # The following values track the mean and variance of the signal.
    # When the variance is large, the audio is loud.
    mean = 0
    variance = 0

    # If alpha is small, mean and variance change slower but are less noisy.
    alpha = 1 / (time_constant * float(sample_rate))

    for i in range(length):
        sample_time = float(i) / samplerate
        sample = struct.unpack('<h', wav.readframes(1))

        # mean is the average value of sample
        mean = (1-alpha) * mean + alpha * sample

        # variance is the average value of (sample - mean) ** 2
        variance = (1-alpha) * variance + alpha * (sample - mean) ** 2

        # check if we're loud, and record the time if this changes
        new_is_loud = variance > threshold
        if is_loud != new_is_loud:
            result.append((sample_time, new_is_loud))
        is_loud = new_is_loud

    return result

def play_sentence(wav_path):
    loud_times = get_loud_times(wav_path)
    pygame.mixer.music.load(wav_path)

    start_time = time.time()
    pygame.mixer.music.play()

    for (t, is_loud) in loud_times:
        # wait until the time described by this entry
        sleep_time = start_time + t - time.time()
        if sleep_time > 0:
            time.sleep(sleep_time)

        # do whatever
        print 'loud' if is_loud else 'quiet'

得到播放wav音频级别作为输出

问题描述

1 个解决方案

解决方案1
7 已采纳

得到播放wav音频级别作为输出

问题描述

1 个解决方案

解决方案1 7 已采纳

解决方案1
7 已采纳