简体   繁体   中英

Writing read_jpeg and decode_jpeg functions for TensorFlow Lite C++

TensorFlow Lite has a good C++ image classification example in their repo, here . However, I'm working with .jpeg and this example is restricted to decoding .bmp images with bitmap_helpers.cc .

I'm trying to create my own jpeg decoder but I'm not well versed in image processing so could use some help. I'm reusing this jpeg decoder as a third party helper library. In the example's bmp decoding, I don't quite understand what's the deal with calculating row_sizes and taking in the bytes array after the header . Could anyone shed some light into how this would apply for a jpeg decoder? Or, even better, is there already a C++ decode_jpeg function hiding somewhere which I have not found?

The final implementation must be in TensorFlow Lite in C++.

thank you so much!

EDIT:

Below is what I have so far. I don't get the same confidence values as when I use the Python example of the image classifier for the same input image and tflite model so this is a clear indication that something is wrong. I essentially copy and pasted the row_size calculation from read_bmp without understanding it so I suspect that might be the issue. What is row_size meant to represent?

std::vector<uint8_t> decode_jpeg(const uint8_t* input, int row_size, int width, int height) {

    // Channels will always be 3. Hardcode it for now.
    int channels = 3;

    // The output that wil lcontain the data for TensorFlow to process.
    std::vector<uint8_t> output(height * width * channels);

    // Go through every pixel of the image.
    for(int i = 0; i < height; i++) {
            int src_pos;
            int dst_pos;

            for(int j = 0; j < width; j++) {

                    src_pos = i * row_size + j * channels;
                    dst_pos = (i * width + j) * channels;

                    // Put RGB channel data into the output array.
                    output[dst_pos] = input[src_pos + 2];
                    output[dst_pos + 1] = input[src_pos + 1];
                    output[dst_pos + 2] = input[src_pos];
            }
    }

    return output;
}

std::vector<uint8_t> read_jpeg(const std::string& input_jpeg_name, int* width, int* height, Settings* s) {

    // Size and buffer.
    size_t size;
    unsigned char *buf;

    // Open the input file.
    FILE *f;
    f = fopen(input_jpeg_name.c_str(), "rb");
    if (!f) {
            if (s->verbose) LOG(INFO) << "Error opening the input file\n";
            exit(-1);
    }

    // Read the file.
    fseek(f, 0, SEEK_END);

    // Ge tthe file size.
    size = ftell(f);

    // Get file data into buffer.
    buf = (unsigned char*)malloc(size);
    fseek(f, 0, SEEK_SET);
    size_t read = fread(buf, 1, size, f);
    
    // Close the file.
    fclose(f);

    // Decode the file.
    Decoder decoder(buf, size);
    if (decoder.GetResult() != Decoder::OK)
    {
            if (s->verbose) LOG(INFO) << "Error decoding the input file\n";
            exit(-1);
    }

    // Get the image from the decoded file.
    unsigned char* img = decoder.GetImage();

    // Get image width and height.
    *width = decoder.GetWidth();
    *height = decoder.GetHeight();

    // TODO: Understand what this row size means. Don't just copy and paste.
    const int row_size = (8 * *channels * *width + 31) / 32 * 4;

    // Decode the JPEG.
    return decode_jpeg(img, row_size, *width, *height);
}

Library you are using is already handling decoding for you, decoder.getImage() contains raw rgb data. You do not need to calculate any sizes whatsoever.

Stuff like row_size is something specific to BMP file format. BMP files may contain some padding bytes in addition to pixel color data, the code was handling that stuff.

Also BMP files store pixel values in BGR order, that is why you have reverse ordering in your original code:

// Put RGB channel data into the output array.
output[dst_pos] = input[src_pos + 2];
output[dst_pos + 1] = input[src_pos + 1];
output[dst_pos + 2] = input[src_pos];

Below code should be working for you (note that decode_jpeg function does not perform any decoding):

std::vector<uint8_t> decode_jpeg(const uint8_t* input, int width, int height) {

    // Channels will always be 3. Hardcode it for now.
    int channels = 3;

    // The output that will contain the data for TensorFlow to process.
    std::vector<uint8_t> output(height * width * channels);

    //  Copy pixel data to output
    for (size_t i = 0; i < height*width*channels; ++i)
    {
        output[i] = input[i];
    }

    
    return output;
}

std::vector<uint8_t> read_jpeg(const std::string& input_jpeg_name, int* width, int* height, Settings* s) {

    // Size and buffer.
    size_t size;
    unsigned char *buf;

    // Open the input file.
    FILE *f;
    f = fopen(input_jpeg_name.c_str(), "rb");
    if (!f) {
            if (s->verbose) LOG(INFO) << "Error opening the input file\n";
            exit(-1);
    }

    // Read the file.
    fseek(f, 0, SEEK_END);

    // Ge tthe file size.
    size = ftell(f);

    // Get file data into buffer.
    buf = (unsigned char*)malloc(size);
    fseek(f, 0, SEEK_SET);
    size_t read = fread(buf, 1, size, f);
    
    // Close the file.
    fclose(f);

    // Decode the file.
    Decoder decoder(buf, size);
    if (decoder.GetResult() != Decoder::OK)
    {
            if (s->verbose) LOG(INFO) << "Error decoding the input file\n";
            exit(-1);
    }

    // Get the image from the decoded file.
    unsigned char* img = decoder.GetImage();

    // Get image width and height.
    *width = decoder.GetWidth();
    *height = decoder.GetHeight();

    // Decode the JPEG.
    return decode_jpeg(img, *width, *height);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM