如何在python中使用麥克風獲得准確的計時

Question

我正在嘗試使用PC麥克風進行節拍檢測，然后使用節拍時間戳計算多個連續節拍之間的距離。 我選擇了python，因為有很多可用的材料，而且開發速度很快。 通過在互聯網上搜索，我已經提出了這個簡單的代碼（沒有高級峰值檢測或任何其他東西，如果需要，這將在以后發生）：

import pyaudio
import struct
import math
import time


SHORT_NORMALIZE = (1.0/32768.0)


def get_rms(block):
    # RMS amplitude is defined as the square root of the
    # mean over time of the square of the amplitude.
    # so we need to convert this string of bytes into
    # a string of 16-bit samples...

    # we will get one short out for each
    # two chars in the string.
    count = len(block)/2
    format = "%dh" % (count)
    shorts = struct.unpack(format, block)

    # iterate over the block.
    sum_squares = 0.0
    for sample in shorts:
        # sample is a signed short in +/- 32768.
        # normalize it to 1.0
        n = sample * SHORT_NORMALIZE
        sum_squares += n*n

    return math.sqrt(sum_squares / count)


CHUNK = 32
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

elapsed_time = 0
prev_detect_time = 0

while True:
    data = stream.read(CHUNK)
    amplitude = get_rms(data)
    if amplitude > 0.05:  # value set by observing graphed data captured from mic
        elapsed_time = time.perf_counter() - prev_detect_time
        if elapsed_time > 0.1:  # guard against multiple spikes at beat point
            print(elapsed_time)
            prev_detect_time = time.perf_counter()

def close_stream():
  stream.stop_stream()
  stream.close()
  p.terminate()

代碼在沉默中工作得非常好，我在運行它的前兩個時刻非常滿意，但后來我嘗試了它的准確度，並且我有點不太滿意。 為了測試這個，我使用了兩種方法：將節拍器設置為60bpm的電話（向麥克風發出tic toc聲音）和連接到蜂鳴器的Arduino，通過精確的Chronodot RTC以1Hz的速率觸發。 蜂鳴器發出嗶嗶聲，觸發檢測。 兩種方法的結果看起來相似（數字代表兩個節拍檢測之間的距離，以秒為單位）：

0.9956681643835616
1.0056331689497717
0.9956100091324198
1.0058207853881278
0.9953449497716891
1.0052103013698623
1.0049350136986295
0.9859074337899543
1.004996383561644
0.9954095342465745
1.0061518904109583
0.9953025753424658
1.0051235068493156
1.0057199634703196
0.984839305936072
1.00610396347032
0.9951862648401821
1.0053146301369864
0.9960100821917806
1.0053391780821919
0.9947373881278523
1.0058608219178105
1.0056580091324214
0.9852110319634697
1.0054473059360731
0.9950465753424638
1.0058237077625556
0.995704694063928
1.0054566575342463
0.9851026118721435
1.0059882374429243
1.0052523835616398
0.9956161461187207
1.0050863926940607
0.9955758173515932
1.0058052968036577
0.9953960913242028
1.0048014611872205
1.006336876712325
0.9847434520547935
1.0059712876712297

現在我非常有信心，至少Arduino精確到1毫秒（這是目標精度）。 結果往往會偏離+ - 5毫秒，但偶爾會下降15毫秒，這是不可接受的。 有沒有辦法達到更高的准確性或是python /聲卡/其他東西的限制？ 謝謝！

編輯：將tom10和barny的建議合並到代碼中后，代碼如下所示：

import pyaudio
import struct
import math
import psutil
import os


def set_high_priority():
    p = psutil.Process(os.getpid())
    p.nice(psutil.HIGH_PRIORITY_CLASS)


SHORT_NORMALIZE = (1.0/32768.0)


def get_rms(block):
    # RMS amplitude is defined as the square root of the
    # mean over time of the square of the amplitude.
    # so we need to convert this string of bytes into
    # a string of 16-bit samples...

    # we will get one short out for each
    # two chars in the string.
    count = len(block)/2
    format = "%dh" % (count)
    shorts = struct.unpack(format, block)

    # iterate over the block.
    sum_squares = 0.0
    for sample in shorts:
        # sample is a signed short in +/- 32768.
        # normalize it to 1.0
        n = sample * SHORT_NORMALIZE
        sum_squares += n*n

    return math.sqrt(sum_squares / count)


CHUNK = 4096
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
RUNTIME_SECONDS = 10

set_high_priority()

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

elapsed_time = 0
prev_detect_time = 0
TIME_PER_CHUNK = 1000 / RATE * CHUNK
SAMPLE_GROUP_SIZE = 32  # 1 sample = 2 bytes, group is closest to 1 msec elapsing
TIME_PER_GROUP = 1000 / RATE * SAMPLE_GROUP_SIZE

for i in range(0, int(RATE / CHUNK * RUNTIME_SECONDS)):
    data = stream.read(CHUNK)
    time_in_chunk = 0
    group_index = 0
    for j in range(0, len(data), (SAMPLE_GROUP_SIZE * 2)):
        group = data[j:(j + (SAMPLE_GROUP_SIZE * 2))]
        amplitude = get_rms(group)
        amplitudes.append(amplitude)
        if amplitude > 0.02:
            current_time = (elapsed_time + time_in_chunk)
            time_since_last_beat = current_time - prev_detect_time
            if time_since_last_beat > 500:
                print(time_since_last_beat)
                prev_detect_time = current_time
        time_in_chunk = (group_index+1) * TIME_PER_GROUP
        group_index += 1
    elapsed_time = (i+1) * TIME_PER_CHUNK

stream.stop_stream()
stream.close()
p.terminate()

使用此代碼，我獲得了以下結果（單位是這個時間毫秒而不是秒）：

999.909297052154
999.9092970521542
999.9092970521542
999.9092970521542
999.9092970521542
1000.6349206349205
999.9092970521551
999.9092970521524
999.9092970521542
999.909297052156
999.9092970521542
999.9092970521542
999.9092970521524
999.9092970521542

如果我沒有犯任何錯誤，它看起來比以前好很多，並且達到了亞毫秒的精度。 我感謝tom10和barny的幫助。

Answer 1

您沒有獲得正確的節拍時間的原因是您丟失了音頻數據塊。 也就是說， 聲卡正在讀取塊，但是在用下一個塊覆蓋之前，你不會收集數據 。

首先，對於這個問題，您需要區分計時准確性和實時響應的想法。

聲卡的定時精度應該非常好，比ms好很多，並且您應該能夠捕獲從聲卡讀取的數據中的所有這些精度。 計算機操作系統的實時響應性應該非常差，比ms差很多。 也就是說，您應該能夠輕松地將音頻事件（例如節拍）識別到ms內，但不能在它們發生時識別它們（相反，30-200ms以后取決於您的系統）。 這種安排通常適用於計算機，因為一般人類對事件發生時間的感知遠遠大於一毫秒（除了罕見的專門的感知系統，比如比較兩只耳朵之間的聽覺事件等）。

您的代碼的具體問題是， CHUNKS太小，操作系統無法在每個樣本中查詢聲卡。 你有32，所以在44100Hz，操作系統需要每0.7毫秒到達聲卡，這對於一個負責做許多其他事情的計算機來說太短了。 如果操作系統在下一個進入之前沒有獲得該塊，則原始塊將被覆蓋並丟失。

為了使其工作，使其與上述約束一致，使CHUNKS遠大於32 ，更像1024 （如PyAudio示例中所示）。 取決於您的計算機及其正在做的事情，即使我的時間不夠長。

如果這種方法不適合您，您可能需要像Arduino這樣的專用實時系統。 （一般來說，這不是必需的，所以在你決定需要使用Arduino之前要三思而后行。通常，當我看到人們需要真正的實時時，它是在嘗試做一些與人類非常量化的事情時，如閃光燈，讓人輕按一個按鈕，閃光另一個燈，讓人輕按另一個按鈕等，以測量響應時間。）

如何在python中使用麥克風獲得准確的計時

問題描述

1 個解決方案

解決方案1
4 已采納 2018-11-01 17:46:16

如何在python中使用麥克風獲得准確的計時

問題描述

1 個解決方案

解決方案1 4 已采納 2018-11-01 17:46:16

解決方案1
4 已采納 2018-11-01 17:46:16