简体   繁体   中英

Audio Visualizer from wav looks wrong

I'm having trouble making an audio visualizer look accurate. The bins that have a significant amount of sound tend to draw correctly, but the problem I'm having is that all the frequencies with no significant sound seem to be coming back with a value that usually bounces between -60dB and -40dB. This forms a flat bouncing line (usually in the higher freqencies).

I want to display 512 bins or less at 30 frames per second. I've been reading up on FFT and audio non stop for a couple weeks now, and so far my process has been:

  • Load pcm data from wav file. This comes in as 44100 samples per second that have a range of -/+ 32767. I'm assuming I treat these as real numbers when passing them to the FFT.
  • Divide these samples up into 1470 per frame. (446 are ignored)
  • Take 1024 samples and apply a Hann window.
  • Pass the samples to FFT as an array of real[1024] as well as another array of the same size filled with zeros for the imaginary part.
  • Get the magnitude by looping through the (samples/2) bins and do a sqrt(real[i]*real[i] + img[i]*img[i]).
  • Taking 20 * log(magnitude) to get the decibel level of each bin
  • Draw a rectangle for each bin. Draw these bins for each frame.

I've tested it with a couple songs, and a wav file I generated that just plays a tone at 440Hz. With the wav file, I do get a spike at the 440 bin, but all the other bins form a line that isn't much shorter than the 440 bin. Also every other frame, the bins apart from 440 look like a graphed log function with a dip on some other bin.

I'm writing this in c++. Using STK to only load left channel from the audio file:

//put every sample in the song into a temporary vector
for (int i = 0; i < stkObject->getSize(); i++)
{
    standardVector.push_back(stkObject->tick(LEFT));
}

I'm using FFTReal to perform the FFT:

    std::vector<std::vector <double> > leftChannelData;
    int numberOfFrames = stkObject->getSize()/samplesPerFrame;

    leftChannelData.resize(numberOfFrames);
    for(int i = 0; i < numberOfFrames; i++)
    {
        for(int j = 0; j < FFT_SAMPLE_LENGTH; j++)
        {
            real[j] = standardVector[j + (i*samplesPerFrame)];
        }

        applyHannWindow(real, FFT_SAMPLE_LENGTH);
        fft_object.do_fft(imaginary,real);

        //FFTReal instructions say to run this after an fft
        fft_object.rescale(real);

        leftChannelData[i].resize(FFT_SAMPLE_LENGTH/2);
        for (int j = 0; j < FFT_SAMPLE_LENGTH/2; j++)
        {
            double magnitude = sqrt(real[j]*real[j] + imaginary[j]*imaginary[j]);
            double dbValue = 20 * log(magnitude/maxMagnitude);

            leftChannelData[i].at(j) = dbValue;
        }
    }

I'm at a loss as to what's causing this. I've tried various ways to pull those 446 samples I'm ignoring, but the results don't seem to change. I think I may be doing something fundamentally wrong. I've tried normalizing the pcm data before handing it to the fft and I've tried normalizing the magnitude before finding the decibels, but it doesn't seem to be working. Any thoughts?

EDIT: I don't see any difference between log(magnitude) and log(magnitude/maxMagnitude). All it seems to do is shift all of the bin's values evenly downwards.

EDIT2: Here's a what they look like to get a visual:

Song playing low sounds - with log(mag)

Song playing low sounds - same but with log(mag/maxMag)

Again, log(mag) and log(mag/maxMag) generally look the same, but with values spanning in the negative. Like MSalters said, the decibel can approach -infinite, so I can clamp those values to -100dB. Then take log(mag/maxMag) and add 100. That way the rectangle's height range from 0 to 100 instead of -100 to 0.

Is this what I should do? I've tried this, but it still looks wrong. Maybe it's just a scaling issue? When I do this, a lot of the bars don't make it above the line when it sounds like they should. And if they do make it above 0, they do so just barely.

You have to understand that you're not taking the Fourier Transform of an infinite signal, but the FT of an windowed version thereof. And your window isn't even a plain Hann window. Discarding 446 points is effectively a rectangular window function. The FT of the window functions will both show up in your output.

Secondly, the dB scale is logarithmic. That indeed means it can go quite low in the absence of a signal. You mention -60 dB, but it in fact could hit minus infinity. The only thing that would save you from that is the window function, which will introduce smear at about -110 dB.

The noise (stop band ripple) produced by a quantized Von Hann window of length 1024 could well be around -40 to -60 dB. So one strategy is to just set a threshold, and ignore (don't plot) all values below that threshold.

Also, try removing the rescale(real) function, as that could distort your complex vector before you take the log magnitude.

Also, make sure you are actually loading the audio samples into your real vector correctly (sign, number of bits and endianess).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM