Copying LMDB to another LMDB reduces file size

Question

To shuffle the data in already existing lmdb ( Trying to solve this problem ). I retrieved the data, shuffled and wrote back to new lmdb. But the when I checked the lmdb file size, it is reduced. Old lmdb file size: 3792896 but the New lmdb file size: 2314240.

Python code Inplemented:

import lmdb
from random import shuffle

lst_data = [];

env = lmdb.open('val_3', readonly=True);
with env.begin() as txn:
    cursor = txn.cursor();
    for key, value in cursor:
        innerlst_data = [key,value];
        lst_data.append(innerlst_data);

shuffle(lst_data);

env1 = lmdb.open('mod_val_3');
with env1.begin(write=True) as txn1:
    for i in range(len(lst_data)):
        str_id = '{:08}'.format(i);
        txn1.put(str_id.encode('ascii'),lst_data[i][1]);

Reference for the code is taken from here . Any suggestions/ideas would be helpful.

Answer 1

You can use mdb_stat to see the number of entries in the database. This should confirm if your copy worked correctly.

Answer 2

Newer versions of the lmdb Python wrappers (at least as of 1.3.0) include an environment copy method , which has a compact option that appears to do what @Ravi was trying to do. Use it like this (adjusting lmdb.open parameters as necessary):

# Copy old database into new one with compacting
# Old database is ~34G from deleting 200k of 400k original records
with lmdb.open(
    "200k-split.lmdb",
    map_size=109951162777,
    subdir=False,
    meminit=False,
    map_async=True,
) as env:
    env.copy(path="200k-split-compacted.lmdb", compact=True)

You can then verify that the compacted file has the same number of records as the original file...

with lmdb.open(
    "200k-split.lmdb",
    map_size=109951162777,
    subdir=False,
    meminit=False,
    map_async=True,
) as env:
    print(env.stat())

# {'psize': 4096, 'depth': 3, 'branch_pages': 19, 
#  'leaf_pages': 2228, 'overflow_pages': 3600000, 'entries': 200000}

with lmdb.open(
    "200k-split-compacted.lmdb",
    map_size=109951162777,
    subdir=False,
    meminit=False,
    map_async=True,
) as env:
    print(env.stat())

# {'psize': 4096, 'depth': 3, 'branch_pages': 19,
#  'leaf_pages': 2228, 'overflow_pages': 3600000, 'entries': 200000}

...but a vastly smaller file size.

> ls -lah *.lmdb
-rw-rw-r-- 1 samueldy samueldy 14G Mar  2 03:31 200k-split-compacted.lmdb
-rw-r--r-- 1 samueldy samueldy 34G Mar  2 03:29 200k-split.lmdb

Copying LMDB to another LMDB reduces file size

Question

2 answers

solution1
1 2016-06-05 19:49:25

solution2
0 2022-03-02 10:45:11

Copying LMDB to another LMDB reduces file size

Question

2 answers

solution1 1 2016-06-05 19:49:25

solution2 0 2022-03-02 10:45:11

solution1
1 2016-06-05 19:49:25

solution2
0 2022-03-02 10:45:11