To shuffle the data in already existing lmdb ( Trying to solve this problem ). I retrieved the data, shuffled and wrote back to new lmdb. But the when I checked the lmdb file size, it is reduced. Old lmdb file size: 3792896 but the New lmdb file size: 2314240.
Python code Inplemented:
import lmdb
from random import shuffle
lst_data = [];
env = lmdb.open('val_3', readonly=True);
with env.begin() as txn:
cursor = txn.cursor();
for key, value in cursor:
innerlst_data = [key,value];
lst_data.append(innerlst_data);
shuffle(lst_data);
env1 = lmdb.open('mod_val_3');
with env1.begin(write=True) as txn1:
for i in range(len(lst_data)):
str_id = '{:08}'.format(i);
txn1.put(str_id.encode('ascii'),lst_data[i][1]);
Reference for the code is taken from here . Any suggestions/ideas would be helpful.
You can use mdb_stat
to see the number of entries in the database. This should confirm if your copy worked correctly.
Newer versions of the lmdb Python wrappers (at least as of 1.3.0) include an environment copy
method , which has a compact
option that appears to do what @Ravi was trying to do. Use it like this (adjusting lmdb.open
parameters as necessary):
# Copy old database into new one with compacting
# Old database is ~34G from deleting 200k of 400k original records
with lmdb.open(
"200k-split.lmdb",
map_size=109951162777,
subdir=False,
meminit=False,
map_async=True,
) as env:
env.copy(path="200k-split-compacted.lmdb", compact=True)
You can then verify that the compacted file has the same number of records as the original file...
with lmdb.open(
"200k-split.lmdb",
map_size=109951162777,
subdir=False,
meminit=False,
map_async=True,
) as env:
print(env.stat())
# {'psize': 4096, 'depth': 3, 'branch_pages': 19,
# 'leaf_pages': 2228, 'overflow_pages': 3600000, 'entries': 200000}
with lmdb.open(
"200k-split-compacted.lmdb",
map_size=109951162777,
subdir=False,
meminit=False,
map_async=True,
) as env:
print(env.stat())
# {'psize': 4096, 'depth': 3, 'branch_pages': 19,
# 'leaf_pages': 2228, 'overflow_pages': 3600000, 'entries': 200000}
...but a vastly smaller file size.
> ls -lah *.lmdb
-rw-rw-r-- 1 samueldy samueldy 14G Mar 2 03:31 200k-split-compacted.lmdb
-rw-r--r-- 1 samueldy samueldy 34G Mar 2 03:29 200k-split.lmdb
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.