简体   繁体   中英

Creating lmdb file in the right way

I'm trying to create an lmdb file that contains all of my database images (in order to train CNN).

This is my 'test code', that I took from here :

import numpy as np
import lmdb
import caffe
import cv2
import glob

N = 18

# Let's pretend this is interesting data
X = np.zeros((N, 1, 32, 32), dtype=np.uint8)
y = np.zeros(N, dtype=np.int64)

# We need to prepare the database for the size. We'll set it 10 times
# greater than what we theoretically need. There is little drawback to
# setting this too big. If you still run into problem after raising
# this, you might want to try saving fewer entries in a single
# transaction.
map_size = X.nbytes * 10


train_data = [img for img in glob.glob("/home/roishik/Desktop/Thesis/Code/cafe_cnn/third/code/train_images/*png")]
for i , img_path in enumerate(train_data):
    img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
    X[i]=img
    y[i]=i%2


env = lmdb.open('train', map_size=map_size)
print X
print y
with env.begin(write=True) as txn:
    # txn is a Transaction object
    for i in range(N):
        datum = caffe.proto.caffe_pb2.Datum()
        datum.channels = X.shape[1]
        datum.height = X.shape[2]
        datum.width = X.shape[3]
        datum.data = X[i].tobytes()  # or .tostring() if numpy < 1.9
        print 'a ' + str(X[i])        
        datum.label = int(y[i])
        print 'b ' + str(datum.label)
        str_id = '{:08}'.format(i)                

        txn.put(str_id.encode('ascii'), datum.SerializeToString())

As you can see I specified random binary labels (0 or 1, for even or odd, respectively). before I create much larger lmdb file I wanna make sure that I'm doing it the right way.

After creating this file I wanted to 'look into the file' and check if it's OK, but I couldn't. the file didn't open properly using python, Access 2016, and .mdb reader (linux ubunto software). my problems are:

  1. I don't understand what this code is doing. what is str_id ? what is X[i].tobytes ? what does the last line do?

  2. After I run the code, I got 2 files: 'data.mdb' and 'key.mdb'. what are those two? maybe those 2 files are the reason why I can't open the database?

两个输出文件

Thanks a lot, really appreciate your help!

str_id is the internal name of the data set (eg one JPG image) used inside the LMDB. It's derived from the path and sequence number i .

tobytes ... here, let me search that for you. This overall process, through the end of the loop, converts the data set (datum) to the LMDB format, and then copies that binary representation straight to the file. tobytes and SerializeToString are the critical methods that transfer the bit pattern as-is.

data.mdb is the relatively huge data file, containing all of these bit sequences in a readily recoverable form. In other words, it's not blocking your DB access, because it is the data base.

lock.mdb is the record-level lock file: each datum gets appropriately locked (fully or read-only) during any read or write.

That should explain the open questions. lock will not block opening the data base; it operates only during access operations. Check your file permissions. Check your user identity as well: did the LMDB creation run as root , perhaps, and not give you read permissions? Have you tried opening it read-only with a simple-minded editor, such as vi or wordpad ?

I hope this gets you moving toward a solution.

您可以使用mdb_dump工具检查数据库的内容。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM