简体   繁体   中英

Loading images takes very long in C++ using OpenCV on Windows 8.1

I am currently working on a data-driven learning application in C++. I have a huge amount of data, over 300.000 images needing around 3 GB in total.

Regarding my working environment :

  • Windows 8.1, 64bit
  • Visual Studio 2013
  • OpenCV
  • OpenMP

My hardware :

  • i7-3770
  • 8GB RAM
  • SSD (System and Visual Studio)
  • HDD

My problem in short is, that loading only the images, so 3 GB, takes over 3 hours, which I would like to improve.

The implementation is as follows: At first, I am loading some information regarding the images (not the images themselves) from a file. Internally I am using a standard vector, which holds 300.000 pointers to my class Item. Item holds the information loaded from the file and the image (OpenCV Mat), which is NOT loaded yet. Some independent intermediate steps follow. After that, I am iterating through my vector - parallelized using OpenMP - and load the image for each Item using:

imread(PATH_TO_FILE, CV_LOAD_IMAGE_UNCHANGED);

What is really strange to me is, that increasing the amount of images is not a linear increase for the loading time of the images. Using 22.000 images takes around 22 seconds, using 44.000 images takes 1 minute and 43 seconds, 66.000 images takes around 4 minutes and so on.

I am unsure, if this problem is due to a hardware bottleneck (which I assumed at first) or due to an implementation flaw on my side. I have already considered a lot, like:

  • halving the image bit depth and thus halving the memory size does not decrease the time taken
  • my Virtual Memory for my application is maxed at around 4GB, so there should not be too much swapping involved
  • there is no difference in loading the data from my system SSD to loading the data from my HDD
  • giving the process a higher priority (I chose the highest, which is realtime) improved the runtime a little, however, the runtimes given above do use this improvement already
  • even though I am using OpenMP, RessourceManager states that during loading of my images, my application only uses 15% of my CPU. Using prints, I can tell that there are all 8 workers sharing the loading though.
  • shrinked the vector size using vector.shrink_to_fit()

These facts seem to me as there is not a hardware problem, but an implementation flaw going on. Is it inefficient to use a huge vector holding over 300.000 pointers? Or anything regarding OpenCV Mat I have not considered? Any hints on how I can pinpoint the problem further? I appreciate any suggestions on what is causing this behaviour and how I probably could get around it.

Thanks in advance!

EDIT : How I load the images into the vector. Note that i renamed some things so there might be typos.

void LoadAllImages()
{
    for (int i = 0; i < data->size(); i++)
    {
        Item* cur_item = data->at(i);
        cur_item->setImage(cur_item->loadImage());
    }
}


Mat Item::loadImage()
{
    return imread(IMAGES_PATH + image_name_, CV_LOAD_IMAGE_UNCHANGED);
}


void Item::setImage(Mat img)
{
    img_ = img;
}

EDIT2 : How I setup the vector without images. Note that I used boosts multithreading for this part. Also note, that this parts execution time increases linear with the data.

void foo(vector<Item*>* data, const string file_path, const string file_name)
{
    //open file

    string image_name;
    boost::mutex data_mutex;
    boost::thread_group thread_group;
    while (file >> image_name)
    {
        //reading other data regarding the current image

        thread_group.add_thread(new boost::thread(addDataToVectorThread, data, image_name, other_data_read, &data_mutex));
    }
    thread_group.join_all();
}


void FileHandler::addDataToVectorThread(vector<Item*>* data, string image_name, vector<float> other_data, boost::mutex* data_mutex)
{
    Item* item = new Item(other_data, image_name);
    data_mutex->lock();
    data->push_back(item);
    data_mutex->unlock();
}

EDIT3 : I tried the code provided by SSteve and was able to narrow down my problem. This code generates random images of the same size as mine are, so 96x96 with a color depth of 8bit. Note that I changed his code to only generate grayscale images as my images are. Loading 300.000 images on my laptop took like 10 minutes, which is fine.

I simplified my code as much as possible and removed ALL multi-threading. I have changed my code as such, that the images are directly loaded to the Items, so during vector creation.

Watching the Resource Monitor I noticed that my images are taking a LOT of memory. Loading 10.000 images already takes 1 GB. Using CV_LOAD_IMAGE_GRAYSCALE instead of CV_LOAD_IMAGE_UNCHANGED halves the memory consumption. I don't get it, my images are definitely 96x96x8 bit and it is still way too much.

Using my full code but loading the random images created by the code of SSteve using only one color channel takes 100 MB of memory for 10.000 images and some additional stuff. The images alone should take ~90 MB, so it should be fine. This is only a little fraction compared to my images.

In short : My images seem to cause the problem, but I don't get why.

How I get to these images : The images I use for the problematic part of my algorithm are preprocessed by me. This preprocessing step is independent and basically scaling down the images. So the images I got to work do have a size of 240x320 and a 16 bit depth. Then I scaled those images to 96x96 and an 8 bit depth.

Is there any possibility that for some reason the images I scaled down are stored in the correct size and this size is correctly displayed in the image properties by Windows, but the images still include some information which should be “removed”? So that they take up more memory than they should? It does not make any sense to me.

Thanks so far for all the help.

I don't think OpenCV is your bottleneck. I did a test on my 2009-vintage 2.8 GHz Core 2 Duo MacBook Pro with 8 GB RAM running OS X 10.11.3. I was able to load 300,000 images in 3.3 minutes. 150,000 images took 1.5 minutes.

Here's the program I used to create the 300,000 images. They take about 8.6 GB of space on my hard drive.

#include "opencv2/core.hpp"
#include "opencv2/imgcodecs.hpp"

using namespace cv;

class Item;

int main(int argc, char *argv[]) {

    Mat image(Size(96,96), CV_8UC3);
    RNG rng;
    char fname[256];
    for (int i = 0; i < 300000; i++) {
        rng.fill(image, RNG::UNIFORM, 0, 256);
        sprintf(fname, "img%06d.png", i);
        imwrite(fname, image);
        if (0 == i % 500) {
            printf("%d\n", i);
        }
    }
    return 0;
}

Here's the program I used to create the vector of Item s and load the images. I think it's similar enough to the code snippets in your question to duplicate the issue.

#include "opencv2/core.hpp"
#include "opencv2/highgui.hpp"

using namespace std;
using namespace cv;

#define CV_LOAD_IMAGE_UNCHANGED -1

String IMAGES_PATH = "/Users/steve/Development/tests/so35602911/images/";

class Item {
public:
    String image_name;
    Mat img_;
    Mat loadImage();
    void setImage(Mat img);
};

Mat Item::loadImage() {
    return imread(IMAGES_PATH + image_name, CV_LOAD_IMAGE_UNCHANGED);
}

void Item::setImage(Mat img) {
    img_ = img;
}

int main(int argc, char *argv[]) {
    int imagesToProcess = 300000;

    vector<Item*> items;
    char filename[256];
    for (int i = 0; i < imagesToProcess; i++) {
        Item *theItem = new Item;
        sprintf(filename, "img%06d.png", i);
        theItem->image_name = filename;
        items.push_back(theItem);
    }

    printf("Set up %lu items.\n", items.size());

    time_t startTime = time(0);
    for (int i = 0; i < items.size(); i++) {
        Item* cur_item = items[i];
        cur_item->setImage(cur_item->loadImage());
    }
    time_t endTime = time(0);

    printf("%lu images. Finished in %.1f minutes.\n", items.size(), (endTime - startTime) / 60.0);

    //Show the last image just to prove they got loaded
    //imshow("last", items[items.size() - 1]->img_);
    //waitKey(0);

    return 0;
}

I'd suggest removing the code to parallelize the image loading. As pointed out in the comments, file I/O doesn't parallelize well.

If that doesn't help, you should try to run your program (or get someone to run it for you) on Unix or OS X to see if Windows is the culprit.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM