简体   繁体   中英

How to loop faster over a large image dataset using opencv and python?

I am having a random dataset comprises of 100000 images.

I have used the following code on the same dataset but the processing speed is terribly slow (in AWS GPU instance).

import cv2
from progressbar import ProgressBar
pbar = ProgressBar()
def image_to_feature_vector(image, size=(128, 128)):
    return cv2.resize(image, size).flatten()
imagePath = #path to dataset
data = []
#load images
for i in pbar(range(0,len(imagePath))):
   image = cv2.imread(imagePath[i])
   image=cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
   features = image_to_feature_vector(image)
   data.append(features)

How to improve processing speed?

The real solution depends on the bottleneck analysis.

Anyway, the image reading (loading) time is a valuable resource that you could use.

Your process is sequential:

在此处输入图片说明

In scenarios like that I use something called IO pipeline or parallel pipeline. The idea is to use one thread to load serially the images and serve them for multiple processing threads. Thus, while you Input-thread is reading, one or more threads are using the CPUs to processing previous images. Use a single thread to write out the data serially as well:

在此处输入图片说明

Unfortunately I don't use python that much to write something as example. This pattern would be already implemented in an python thread framework.

I use this approach for grab camera frames and processing them in high speed, but I use C++ for it. if you don't matter to programming in C++, you would find something inspiring in this impressive answer .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM