简体   繁体   English

WAV的Audio Visualizer看起来不对

[英]Audio Visualizer from wav looks wrong

I'm having trouble making an audio visualizer look accurate. 我无法使音频可视化器看起来准确。 The bins that have a significant amount of sound tend to draw correctly, but the problem I'm having is that all the frequencies with no significant sound seem to be coming back with a value that usually bounces between -60dB and -40dB. 具有大量声音的垃圾箱往往会正确绘制,但我遇到的问题是,所有没有明显声音的频率似乎都以一个通常在-60dB到-40dB反弹的值返回。 This forms a flat bouncing line (usually in the higher freqencies). 这将形成一条平坦的弹跳线(通常在较高的频率中)。

I want to display 512 bins or less at 30 frames per second. 我想以每秒30帧的速度显示512 bins或更少。 I've been reading up on FFT and audio non stop for a couple weeks now, and so far my process has been: 我已经连续数周阅读了FFT和音频不间断的信息,到目前为止,我的过程是:

  • Load pcm data from wav file. 从wav文件加载pcm数据。 This comes in as 44100 samples per second that have a range of -/+ 32767. I'm assuming I treat these as real numbers when passing them to the FFT. 这是每秒44100个样本,范围为-/ +32767。我假设将它们传递给FFT时将它们视为实数。
  • Divide these samples up into 1470 per frame. 将这些样本分成每帧1470个。 (446 are ignored) (忽略446)
  • Take 1024 samples and apply a Hann window. 采集1024个样本并应用Hann窗口。
  • Pass the samples to FFT as an array of real[1024] as well as another array of the same size filled with zeros for the imaginary part. 将样本作为实数[1024]的数组以及大小相同的另一数组传递给FFT,虚部用零填充。
  • Get the magnitude by looping through the (samples/2) bins and do a sqrt(real[i]*real[i] + img[i]*img[i]). 通过遍历(samples / 2)个bin并执行sqrt(real [i] * real [i] + img [i] * img [i])获得幅度。
  • Taking 20 * log(magnitude) to get the decibel level of each bin 取20 * log(量级)以获得每个仓的分贝水平
  • Draw a rectangle for each bin. 为每个垃圾箱绘制一个矩形。 Draw these bins for each frame. 为每帧绘制这些垃圾箱。

I've tested it with a couple songs, and a wav file I generated that just plays a tone at 440Hz. 我已经用几首歌曲对其进行了测试,并且生成了一个wav文件,该文件只播放440Hz的音调。 With the wav file, I do get a spike at the 440 bin, but all the other bins form a line that isn't much shorter than the 440 bin. 使用wav文件时,我确实在440纸槽处出现尖峰,但所有其他纸槽形成的线并不比440纸槽短很多。 Also every other frame, the bins apart from 440 look like a graphed log function with a dip on some other bin. 同样每隔一帧,与440分开的bin看起来就像是一个图形对数函数,在某个其他bin上有浸入。

I'm writing this in c++. 我用C ++编写。 Using STK to only load left channel from the audio file: 使用STK仅从音频文件加载左声道:

//put every sample in the song into a temporary vector
for (int i = 0; i < stkObject->getSize(); i++)
{
    standardVector.push_back(stkObject->tick(LEFT));
}

I'm using FFTReal to perform the FFT: 我正在使用FFTReal执行FFT:

    std::vector<std::vector <double> > leftChannelData;
    int numberOfFrames = stkObject->getSize()/samplesPerFrame;

    leftChannelData.resize(numberOfFrames);
    for(int i = 0; i < numberOfFrames; i++)
    {
        for(int j = 0; j < FFT_SAMPLE_LENGTH; j++)
        {
            real[j] = standardVector[j + (i*samplesPerFrame)];
        }

        applyHannWindow(real, FFT_SAMPLE_LENGTH);
        fft_object.do_fft(imaginary,real);

        //FFTReal instructions say to run this after an fft
        fft_object.rescale(real);

        leftChannelData[i].resize(FFT_SAMPLE_LENGTH/2);
        for (int j = 0; j < FFT_SAMPLE_LENGTH/2; j++)
        {
            double magnitude = sqrt(real[j]*real[j] + imaginary[j]*imaginary[j]);
            double dbValue = 20 * log(magnitude/maxMagnitude);

            leftChannelData[i].at(j) = dbValue;
        }
    }

I'm at a loss as to what's causing this. 我不知道是什么原因造成的。 I've tried various ways to pull those 446 samples I'm ignoring, but the results don't seem to change. 我尝试了各种方法来提取我忽略的446个样本,但是结果似乎并没有改变。 I think I may be doing something fundamentally wrong. 我想我可能做的是根本错误的事情。 I've tried normalizing the pcm data before handing it to the fft and I've tried normalizing the magnitude before finding the decibels, but it doesn't seem to be working. 我已经尝试过将pcm数据归一化,然后将其发送给fft,并且在找到分贝之前尝试过归一化幅度,但它似乎没有用。 Any thoughts? 有什么想法吗?

EDIT: I don't see any difference between log(magnitude) and log(magnitude/maxMagnitude). 编辑:我看不到log(magnitude)和log(magnitude / maxMagnitude)之间的任何区别。 All it seems to do is shift all of the bin's values evenly downwards. 似乎要做的就是将bin的所有值平均向下移动。

EDIT2: Here's a what they look like to get a visual: EDIT2:这是他们看起来很像的样子:

Song playing low sounds - with log(mag) 歌曲播放声音低 -带对数(mag)

Song playing low sounds - same but with log(mag/maxMag) 歌曲播放低声音 -相同但带有对数(mag / maxMag)

Again, log(mag) and log(mag/maxMag) generally look the same, but with values spanning in the negative. 同样,log(mag)和log(mag / maxMag)通常看起来相同,但是值跨度为负。 Like MSalters said, the decibel can approach -infinite, so I can clamp those values to -100dB. 就像MSalters所说的那样,分贝可以接近-无限,因此我可以将这些值钳位到-100dB。 Then take log(mag/maxMag) and add 100. That way the rectangle's height range from 0 to 100 instead of -100 to 0. 然后取log(mag / maxMag)并加100。这样矩形的高度范围从0到100,而不是-100到0。

Is this what I should do? 这是我应该做的吗? I've tried this, but it still looks wrong. 我已经尝试过了,但是看起来还是错误的。 Maybe it's just a scaling issue? 也许这只是一个扩展问题? When I do this, a lot of the bars don't make it above the line when it sounds like they should. 当我这样做时,当听起来像应该的那样时,许多酒吧都不会使其超出界限。 And if they do make it above 0, they do so just barely. 如果确实将其设置为大于0,则几乎不会这样做。

You have to understand that you're not taking the Fourier Transform of an infinite signal, but the FT of an windowed version thereof. 您必须了解,您并没有采用无限信号的傅立叶变换,而是采用了其窗口版本的FT。 And your window isn't even a plain Hann window. 而且您的窗口甚至不是普通的Hann窗口。 Discarding 446 points is effectively a rectangular window function. 舍弃446个点实际上是一个矩形窗口函数。 The FT of the window functions will both show up in your output. 窗口功能的FT都将显示在输出中。

Secondly, the dB scale is logarithmic. 其次,dB标度是对数的。 That indeed means it can go quite low in the absence of a signal. 这确实意味着在没有信号的情况下它可能会变得很低。 You mention -60 dB, but it in fact could hit minus infinity. 您提到-60 dB,但实际上它可能达到负无穷大。 The only thing that would save you from that is the window function, which will introduce smear at about -110 dB. 唯一可以避免这种情况的是窗口功能,它将在-110 dB左右引入拖影。

The noise (stop band ripple) produced by a quantized Von Hann window of length 1024 could well be around -40 to -60 dB. 长度为1024的Von Hann量化窗口所产生的噪声(阻带纹波)可能约为-40至-60 dB。 So one strategy is to just set a threshold, and ignore (don't plot) all values below that threshold. 因此,一种策略是只设置一个阈值,然后忽略(不要绘制)低于该阈值的所有值。

Also, try removing the rescale(real) function, as that could distort your complex vector before you take the log magnitude. 另外,请尝试删除rescale(real)函数,因为在采用对数幅度之前,这可能会使您的复数向量失真。

Also, make sure you are actually loading the audio samples into your real vector correctly (sign, number of bits and endianess). 另外,请确保您确实将音频样本正确地加载到真实矢量中(符号,位数和字节序)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM