简体   繁体   English

了解音频的帧/样本

[英]Understanding frames/samples for Audio

I am trying to understand this GitHub project called "The Amazing Audio Engine", that eases dealing with audio on iOS. 我试图理解这个名为“ The Amazing Audio Engine”的GitHub项目,该项目可简化iOS上的音频处理。

I am capturing from the microphone and using this method: 我正在从麦克风捕获并使用此方法:

id<AEAudioReceiver> receiver = [AEBlockAudioReceiver audioReceiverWithBlock: ^(void *source, const AudioTimeStamp *time, UInt32 frames, AudioBufferList *audio) {
 // Do something with 'audio'
}];

As far as I see the library samples at 44100 sampling frequency and when the block runs, frames is 1024. If I understood the whole audio thing correctly, every time this block runs it will deliver something like a snapshot of all frequencies the microphone can capture, from the minimum to the maximum hertz. 据我所知,库以44100的采样频率采样,当该块运行时, frames为1024。如果我正确地理解了整个音频内容,则每次运行此块时,它都会提供诸如麦克风可以捕获的所有频率的快照之类的信息。 ,从最小到最大赫兹。 So, if the whole thing is being sampled at 44100, it means that the whole spectrum will be sliced in 44100 slices. 因此,如果在44100处对整个样本进行采样,则意味着整个频谱将被切成44100个切片。

It is not, but supposing the minimum frequency is 0Hz and the maximum frequency is 22 KHz, slice 0 will represent the amplitude of 0Hz and slice 44099 will represent 22KHz, or in other words, the array audio[0] = 0Hz and audio[44099] = 22KHz , right? 不是,但假设最小频率为0Hz,最大频率为22 KHz,切片0将代表0Hz的振幅,切片44099将代表22KHz,换句话说,数组audio[0] = 0Hzaudio[44099] = 22KHz对吗?

Then I have measured the time the block runs and the block is called once in 0.023 seconds. 然后,我测量了该块运行的时间,并在0.023秒内调用了该块一次。 Why? 为什么? Isn't this number slow? 这个数字不慢吗?

This number does not makes sense to me. 这个数字对我来说没有意义。 Shouldn't the block be called at a blazing speed so the whole spectrum would be sampled in time with a short interval? 难道不应该以极快的速度调用该块,这样整个频谱将以较短的间隔及时采样吗?

If I understood the whole audio thing correctly, every time this block runs it will deliver something like a snapshot of all frequencies the microphone can capture, from the minimum to the maximum hertz. 如果我正确理解了整个音频,那么每次运行此模块时,它都会发出类似麦克风可以捕获的所有频率(从最小到最大赫兹)的快照之类的信息。

No; 没有; this is incorrect. 这是不正确的。 Audio data is typically represented in the time domain, not the frequency domain. 音频数据通常在时域而不是频域中表示。

In short: think of audio as a waveform. 简而言之:将音频视为波形。 Each sample represents the height of that waveform at a point in time. 每个样本代表该波形在某个时间点的高度。 There are 44100 such samples per second, and each value in the sample array represents one of them. 每秒有44100个这样的样本,样本数组中的每个值代表其中一个。 With 44100 samples per second, a block of 1024 samples represents 1024/44100 = 0.023 second of audio. 每秒44100个样本,一个1024个样本的块代表1024/44100 = 0.023秒的音频。

来自Wikipedia的图片-https://en.wikipedia.org/wiki/File:Sampled.signal.svg

There is no direct representation of the audio frequency in this data. 此数据中没有直接表示音频的频率。 It is possible to convert a block of time-domain samples to a frequency-domain representation using a Fourier transform , but explaining this is outside the scope of what I can reasonably do in a single answer. 可以使用傅立叶变换将时域样本块转换为频域表示形式,但是解释这一点超出了我在单个答案中可以合理进行的范围。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM