How to loop faster over a large image dataset using opencv and python?

Question

I am having a random dataset comprises of 100000 images.

I have used the following code on the same dataset but the processing speed is terribly slow (in AWS GPU instance).

import cv2
from progressbar import ProgressBar
pbar = ProgressBar()
def image_to_feature_vector(image, size=(128, 128)):
    return cv2.resize(image, size).flatten()
imagePath = #path to dataset
data = []
#load images
for i in pbar(range(0,len(imagePath))):
   image = cv2.imread(imagePath[i])
   image=cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
   features = image_to_feature_vector(image)
   data.append(features)

How to improve processing speed?

Answer 1

The real solution depends on the bottleneck analysis.

Anyway, the image reading (loading) time is a valuable resource that you could use.

Your process is sequential:

In scenarios like that I use something called IO pipeline or parallel pipeline. The idea is to use one thread to load serially the images and serve them for multiple processing threads. Thus, while you Input-thread is reading, one or more threads are using the CPUs to processing previous images. Use a single thread to write out the data serially as well:

Unfortunately I don't use python that much to write something as example. This pattern would be already implemented in an python thread framework.

I use this approach for grab camera frames and processing them in high speed, but I use C++ for it. if you don't matter to programming in C++, you would find something inspiring in this impressive answer .

How to loop faster over a large image dataset using opencv and python?

Question

1 answers

solution1
1 2017-11-29 21:48:50

How to loop faster over a large image dataset using opencv and python?

Question

1 answers

solution1 1 2017-11-29 21:48:50

solution1
1 2017-11-29 21:48:50