简体   繁体   English

如何获取在Python 3.x中记录的音频的频率和幅度?

[英]How do I get the frequency and amplitude of audio that's being recorded in Python 3.x?

I'm trying to record audio and get the average frequency and amplitude of the audio in 1 second intervals without writing to a file. 我正在尝试记录音频,并在1秒钟的时间间隔内获得音频的平均频率和幅度,而无需写入文件。 There are plenty of examples on how this can be done if you read from a file using pyaudio, although anything that could be usable for this specific situation uses Python 2.7 libraries that don't seem to exist for Python 3.x. 有许多示例说明了如何使用pyaudio读取文件,但是对于这种特定情况,可以使用的任何东西都使用了Python 2.7库,而Python 3.x似乎不存在这种库。

Any help would be appreciated! 任何帮助,将不胜感激!

Getting the Audio 获取音频

I'm not exactly sure what library you are using to record the audio but the normal go-to for realtime recording / playback (in my opinion) is PyAudio (you only mentioned it for reading from a file). 我不确定您使用的是哪个库来记录音频,但实时记录/回放的常规方法(在我看来)是PyAudio (您只是提到要从文件中读取它)。

They have an example for blocking and non-blocking audio I/O for realtime processing. 他们有一个用于实时处理的阻塞非阻塞音频I / O的示例。 For example, using blocking mode example, you can carry out your DSP processing everytime you receive a new block of audio. 例如,使用阻塞模式示例,您可以在每次收到新的音频块时执行DSP处理。

"""PyAudio Example: Play a wave file."""

import pyaudio
import wave
import sys

CHUNK = 1024

if len(sys.argv) < 2:
    print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
    sys.exit(-1)

wf = wave.open(sys.argv[1], 'rb')

# instantiate PyAudio (1)
p = pyaudio.PyAudio()

# open stream (2)
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
                channels=wf.getnchannels(),
                rate=wf.getframerate(),
                output=True)

# read data
data = wf.readframes(CHUNK)

# play stream (3)
while len(data) > 0:
    stream.write(data)
    data = wf.readframes(CHUNK)
    # Do all of your DSP processing here i.e. function call or whatever

# stop stream (4)
stream.stop_stream()
stream.close()

# close PyAudio (5)
p.terminate()

Finding the amplitude 求振幅

If you want the amplitude of the signal at any given point, then all you have to do is take the absolute value of one sample in the array of samples, ie to get the amplitude of the 3rd sample from your audio block data. 如果您想要信号在任何给定点的幅度,那么您要做的就是获取样本数组中一个样本的绝对值,即从音频块数据中获取第三个样本的幅度。

ampSample3 = abs(data[2])

Obviously this isn't that useful generally for individual sample amplitudes, but rather it's better to look at the whole block. 显然,这对于单个样本幅度通常没有什么用,但是最好查看整个块。 So you can calculate the absolute for each value, sum up all of the array and divide by the block size (average). 因此,您可以计算每个值的绝对值,将所有数组求和,然后除以块大小(平均值)。

blockAmplitudeMean = sum(numpy.absolute(x))/len(x)

But when working with audio we usually want the RMS value of the block. 但是,在使用音频时,我们通常需要块的RMS值。

blockLinearRms= numpy.sqrt(numpy.mean(data**2)) # Linear value between 0 -> 1
blockLogRms = 20 * math.log10(blockLinearRms) # Decibel (dB value) between 0 dB -> -inf dB

Getting the frequency 获得频率

In your question you just specified getting the frequency of the audio which could mean one of two things. 在您的问题中,您仅指定了获取音频的频率 ,这可能意味着两件事之一。

Determining the frequency spectrum 确定频谱

Commonly used in DSP, the frequency spectrum can be analyzed using the DFT (Discrete Fourier Transform). 通常用于DSP中的频谱可以使用DFT (离散傅里叶变换)进行分析。 You will usually see this under the name FFT (Fast Fourier Transform), since this is the most popular implementation of the DFT. 通常,您会以FFT (快速傅立叶变换)的名称看到它,因为这是DFT最流行的实现。 There are already Python libraries that implement the FFT for you and are simple to use. 已经有Python库为您实现FFT,并且易于使用。

Please note that this will give you an array the length of your block size that contains complex information (real signal + phase information) ie the frequency information. 请注意,这将为您提供一个包含块信息长度的数组,其中包含复杂信息(实际信号+相位信息),即频率信息。 This does not mean you can necessarily identify the pitch of incoming audio (you can't directly tell that someone is playing an A1 note on the piano, unless the signal is really high quality and you still have some basic DSP processing as well as the FFT). 这并不意味着您可以确定输入音频的音调(您不能直接说出某人正在钢琴上弹A1音符,除非信号的质量很高并且您仍然具有一些基本的DSP处理能力以及FFT)。

For reference: 以供参考:

  • Here is a link to the scipy.fft and how to get started 这是scipy.fft的链接以及入门方法
  • And here is the link for numpy.fft with a couple of examples 这是numpy.fft的链接和一些示例

You can call this function in your processing loop if you wanted to do something with the frequency information. 如果您想对频率信息进行某些操作,则可以在处理循环中调用此函数。

Determining the pitch (/musical note) 确定音高(/音符)

This is a non-trivial task that many people try to accomplish. 这是许多人试图完成的一项重要任务。 Most algorithm's usually involve the FFT (as discussed before), but have another layer of complicated processing on top. 大多数算法通常都涉及FFT(如前所述),但顶层还有另一层复杂的处理。 I would recommend using a library unless you fancy developing your own algorithm: 我建议使用一个库,除非您想开发自己的算法:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM