Mongorestore seems to run out of memory and kills the mongo process

Question

In current setup there are two Mongo Docker containers, running on hosts A and B, with Mongo version of 3.4 and running in a replica set. I would like to upgrade them to 3.6 and increase a member so the containers would run on hosts A, B and C. Containers have 8GB memory limit and no swap allocated (currently), and are administrated in Rancher . So my plan was to boot up the three new containers, initialize a replica set for those, take a dump from the 3.4 container, and restore it the the new replica set master.

Taking the dump went fine, and its size was about 16GB. When I tried to restore it to the new 3.6 master, restoring starts fine, but after it has restored roughly 5GB of the data, mongo process seems to be killed by OS/Rancher, and while the container itself doesn't restart, MongoDB process just crashes and reloads itself back up again. If I run mongorestore to the same database again, it says unique key error for all the already inserted entries and then continue where it left off, only to do the same again after 5GB or so. So it seems that mongorestore loads all the entries it restores to memory.

So I've got to get some solution to this, and:

Every time it crashes, just run the mongorestore command so it continues where it left off. It probably should work, but I feel a bit uneasy doing it.
Restore the database one collection at a time, but the largest collection is bigger than 5GB so it wouldn't work properly either.
Add swap or physical memory (temporarily) to the container so the process doesn't get killed after the process runs out of physical memory.
Something else, hopefully a better solution?

Answer 1

Increasing the swap size as the other answer pointed out worked out for me. Also, The --numParallelCollections option controls the number of collections mongodump / mongorestore should dump/restore in parallel. The default is 4 which may consume a lot of memory.

Answer 2

Since it sounds like you're not running out of disk space due to mongorestore continuing where it left off successfully, focusing on memory issues is the correct response. You're definitely running out of memory during the mongorestore process.

I would highly recommend going with the swap space, as this is the simplest, most reliable, least hacky, and arguably the most officially supported way to handle this problem.

Alternatively, if you're for some reason completely opposed to using swap space, you could temporarily use a node with a larger amount of memory, perform the mongorestore on this node, allow it to replicate, then take the node down and replace it with a node that has fewer resources allocated to it. This option should work, but could become quite difficult with larger data sets and is pretty overkill for something like this.

Answer 3

Just documenting here my experience in 2020 using mongodb 4.4:

I ran into this problem restoring a 5GB collection on a machine with 4GB mem. I added 4GB swap which seemed to work, I was no longer seeing the KILLED message.

However, a while later I noticed I was missing a lot of data! Turns out if mongorestore runs out of memory during the final step (at 100%) it will not show killed, BUT IT HASNT IMPORTED YOUR DATA .

You want to make sure you see this final line:

[########################]  cranlike.files.chunks  5.00GB/5.00GB  (100.0%)
[########################]  cranlike.files.chunks  5.00GB/5.00GB  (100.0%)
[########################]  cranlike.files.chunks  5.00GB/5.00GB  (100.0%)
[########################]  cranlike.files.chunks  5.00GB/5.00GB  (100.0%)
[########################]  cranlike.files.chunks  5.00GB/5.00GB  (100.0%)
restoring indexes for collection cranlike.files.chunks from metadata
finished restoring cranlike.files.chunks (23674 documents, 0 failures)
34632 document(s) restored successfully. 0 document(s) failed to restore.

In my case I needed 4GB mem + 8GB swap, to import 5GB GridFS collection.

Answer 4

Rather than starting up a new replica set, it's possible to do the entire expansion and upgrade without even going offline.

Start MongoDB 3.6 on host C
On the primary (currently A or B), add node C into the replica set
Node C will do an initial sync of the data; this may take some time
Once that is finished, take down node B; your replica set has two working nodes still (A and C) so will continue uninterrupted
Replace v3.4 on node B with v3.6 and start back up again
When node B is ready, take down node A
Replace v3.4 on node A with v3.6 and start back up again

You'll be left with the same replica set running as before, but now with three nodes all running v.3.4.

PS Be sure to check out the documentation on Upgrade a Replica Set to 3.6 before you start.

Answer 5

I ran into a similar issue running 3 nodes on a single machine (8GB RAM total) as part of testing a replicaset. The default storage cache size is .5 * (Total RAM - 1GB). The mongorestore caused each node to use the full cache size on restore and consume all available RAM.

I am using ansible to template this part of mongod.conf , but you can set your cacheSizeGB to any reasonable amount so multiple instances do not consume the RAM.

storage:
    wiredTiger:
        engineConfig:
            cacheSizeGB: {{ ansible_memtotal_mb /  1024 * 0.2 }}

Answer 6

I solved the OOM problem by using the --wiredTigerCacheSizeGB parameter of mongod. Excerpt from my docker-compose.yaml below:

version: '3.6'
services:
    db:
        container_name: db
        image: mongo:3.2
        volumes:
            - ./vol/db/:/data/db
        restart: always
        # use 1.5GB for cache instead of the default (Total RAM - 1GB)/2:
        command: mongod --wiredTigerCacheSizeGB 1.5

Mongorestore seems to run out of memory and kills the mongo process

Question

6 answers

solution1
7 2018-10-15 00:03:14

solution2
3 ACCPTED 2018-04-19 19:46:24

solution3
2 2020-12-19 16:36:55

solution4
1 2018-04-20 21:22:21

solution5
1 2020-03-31 04:08:37

solution6
1 2020-07-06 22:00:49

Mongorestore seems to run out of memory and kills the mongo process

Question

6 answers

solution1 7 2018-10-15 00:03:14

solution2 3 ACCPTED 2018-04-19 19:46:24

solution3 2 2020-12-19 16:36:55

solution4 1 2018-04-20 21:22:21

solution5 1 2020-03-31 04:08:37

solution6 1 2020-07-06 22:00:49

solution1
7 2018-10-15 00:03:14

solution2
3 ACCPTED 2018-04-19 19:46:24

solution3
2 2020-12-19 16:36:55

solution4
1 2018-04-20 21:22:21

solution5
1 2020-03-31 04:08:37

solution6
1 2020-07-06 22:00:49