简体   繁体   中英

Collect decoded audio from libav as doubles

I'm currently trying to gather decoded audio data (from multiple formats) to perform certain audio manipulations (using a *.wav file for testing).

I have a class that handles all the decoding via FFmpeg libav. If I extract the data as unit8_t into a vector, and

for (int i = 0; i < bytevector.size(); i++) {
    fwrite(&bytevector[i], sizeof (uint8_t), 1, outfile2);
}

to a raw file and play it via play -t raw -r 44100 -b16 -c 1 -e signed sound.raw it sounds perfectly fine.

However, how is it possible to have all the correct information as doubles when the file for example is 2 bytes per sample and the frame->data information is given as uint8_t? The wav files I've tested are 44100/16bits/1 channel. (I already have code that will change uint8_t* into a double)

Opening the same files with Scilab will show half the size of the byte vector as doubles.

wav file in Scilab as an array of doubles shows:
-0.1, -0.099, -0.098, ..., 0.099, +0.1

versus byte vector:
51, 243, 84, 243, 117, 243, ...

Can 51 and 243 really form a double? Any suggestions on how to get past this issue?

Code below for reference:

 while ((av_read_frame(formatContext, &readingPacket)) == 0) {
        if (readingPacket.stream_index == audioStreamIdx) {
            AVPacket decodingPacket = readingPacket;

            while (decodingPacket.size > 0) {
                int gotFrame = 0;
                int result = avcodec_decode_audio4(context, frame, &gotFrame, &decodingPacket);

                if (result < 0) {
                    break;
                }

                decoded = FFMIN(result, decodingPacket.size);

                if (gotFrame) {
                    data_size = (av_get_bytes_per_sample(context->sample_fmt));
                    if (data_size < 0) {
                    }

                    // Only for 1 channel temporarily
                    for (int i = 0; i < frame->nb_samples; i++) {
                        for (int ch = 0; ch < context->channels; ch++) {
                            for (int j = 0; j < data_size; j++) {
                                bytevector.push_back(*(frame->data[ch] + data_size * i + j)); 
                            }
                        }
                    }
                } else {
                    decodingPacket.size = 0;
                    decodingPacket.data = NULL;
                }
                decodingPacket.size -= result;
                decodingPacket.data += result;
            }
        }
        av_free_packet(&readingPacket);
    }

Quick way to transform two bytes into a float :

byte bits[] = {195,255}; //first sample in the test s16 wav file
int16_t sample;
memcpy(&sample,&bits,sizeof(bits));
std::cout<<sample*(1.0f/32768.0f)<<std::endl;

This code yields -0.001861572265625 when printed (with more precision setprecision(xx);) which is first number given by Scilab with the same file.

I hope this help anybody with similar issues.

Audio data is stored in many different formats. That you get a uint8_t[] array means rather little. It's not one byte per array. Instead, you need to know the format. Here -b16 tells me that the uint8_t[] data is in fact 16 bits PCM-encoded data, ie on a scale from -32768 to +32767. Scilab appears to prefer a floating-point scale, and therefore divides by 32768.0. That's just a representation change; it just shrinks the scale to -1.0, +1.0.

Compare it to angles: a right angle is 90 degrees on pi/2 radians; the exact number doesn't matter but both are 1/4th of a full circle.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM