简体   繁体   中英

Improve speed of I/O operations ? For HDF5 data creation for caffe

The basic objective of my program is to read images and make a hd5 format file. I'm splitting the hd5 data files into parts of 1000 for manageability.

The program reads and resizes the images and then writes to file.

I dont think that using a multi-threading would be improving the speed of this, but I might be wrong.

My dataset is around 15 million images.

I use a powerful pc with a 4GB gpu and 32 GB ram and a Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz

PS I could try using some other image transformation package like opencv, But has no basis for comparison.

As of now the program has been running for 3 days non stop and almost 80% done. I would like to avoid this problem in the future when I do something similar.

ipfldr= "/path/to/img/fldr"
os.chdir(ipfldr)
SIZE = 58 # fixed size to all images
nof = 16

with open( '/path/to/txtfile', 'r' ) as T :
    lines = T.readlines()


# If you do not have enough memory split data into
# multiple batches and generate multiple separate h5 files
print len(lines)
X = np.zeros( (1000,nof*3, SIZE, SIZE), dtype=np.int )
y = np.zeros( (1000,1), dtype=np.int )
for i,l in enumerate(lines):
    sp = l.split(' ')#split files into 17 cats
    cla= int(sp[0].split("/")[0])
    for fr in range(0,nof,1):
        img = caffe.io.load_image( sp[fr] )
        img = caffe.io.resize( img, (3,SIZE, SIZE) ) # resize to fixed size
        # you may apply other input transformations here...
        X[i%1000,fr:fr+3] = img
    y[i%1000] = cla
    if i%1000==0 
        with h5py.File('val/'+'val'+str(int(i/1000))+'.h5','w') as H:
            H.create_dataset( 'data', data=X ) # note the name X given to the dataset!
            H.create_dataset( 'label', data=y ) # note the name y given to the dataset! 
        with open('val_h5_list.txt','w') as L:
            L.write( 'val'+str(int(i/1000))+'.h5' ) # list all h5 files you are going to use
        if (len(lines)-i >= 1000):
            X = np.zeros( (1000,nof*3, SIZE, SIZE), dtype=np.int )
            y = np.zeros( (1000,1), dtype=np.int )
        else:
            break

I am quite sure you can improve your performance with a Multi-threading approach, you haven't been spending 3 days on loading data from disk (you would need an unrealistic amount of disk space to read from), so you seem to be waiting for the resize process on CPU.

You could fx do: 1 Reader that read data in large chucks and puts single images in a queue. Some Workers that take image from queue, resize it and put it another queue. 1 Writer that takes the resized images out of second queues, and write them to disk when it gathered many (Reader and Writer can probably be the same process without efficiency loss, assuming you read / write to the same disk anyway).

My guess is 1 worker per HW thread (16 in your case), minus 2 for the core you put the reader and writer on (so 14), should be a good starting point.

This way you will isolate waits for IO access from the CPU work, and minimize the IO access overhead by doing large chunks of work per time you initialize a read / write.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM