UserWarning: MongoClient opened before fork. Create MongoClient only after forking

Question

I'm using mongoengine in a project that I need to spawn various Process objects from the multiprocessing library. The main process uses the database to determine what needs to be done, and then spawns a new Process to take care of it. Each Process needs to have access to the database for its own data. I'd prefer to use Process objects to keep

I've structured it so that each process calls mongoengine.connect(name, host=host, port=port) when it starts, yet I'm still getting the warning:

UserWarning: MongoClient opened before fork. Create MongoClient only after forking

The warning implies I can not open a MongoClient before I spawn a new Process . I haven't run across any adverse effects, yet, but I would rather have a robust app.

What I'd like is a pattern that I could apply to a problem similar to:

import mongoengine
from multiprocessing import Process

mongoengine.connect(dbname, host=host, port=port)

def worker_process()
  mongoengine.connect(dbname, host=host, port=port)
  # Fetch lots of web data via bs4
  # do some work with the DB
  mongoengine.disconnect()

# do some work with the DB

process = Process(worker_process)
process.start()

# do some work with the DB

process.join()

This is dramatically simplified from what I'm doing (endless loop of checking when it should spawn many processes)

I've read about adding connect=False in the connect parameters, but as far as I've read it only makes the socket connection lazy, and if I start using the database before I create a new Process it will still have the same issues.

I've also thought about creating a unique alias in each Process (I could just use the PID as the alias name), but I can't seem to find a way to tell that process to use that alias.

I have two last resorts:

Move to Threading and loose the ability to kill the threads. Also I'm not nearly as familiar with the gotchas with Threading. Processes have always been useful for my purposes and struck me as much cleaner. I know they are heaver weight, but I'm not concerned about overall performance, more robustness. Maybe someone could talk me into this with a "If you just... you won't have to worry about threading"
Create a dedicated Process to use the DB and determine what to do, and it would report the tasks back up to the main process for it to spawn new Process objects. Thus the main process would never have a mongoengine instance. I really don't want to go down this path if I don't have to as it would be a significant restructure.

Answer 1

Never fails. Do a crapload of research, turn up nothing. Ask the question and the next thing you know the answer falls in your lap.

Turns out that my docker container was having the issue, yet my desktop wasn't. I use MacOS for development, and I stumbled across this footnote in the multiprocessing library:

Changed in version 3.8: On macOS, the spawn start method is now the default. The fork start method should be considered unsafe as it can lead to crashes of the subprocess. See bpo-33725.

So I was testing/validating on 3.8 and using the spawn method and I didn't even realize it. I just added a line to force spawn on startup:

import multiprocessing as mp

# bla bla bla

if __name__ == '__main__':
    mp.set_start_method('spawn')

and everything started working.

Moral of the story, if anyone wants to use MongoEngine/pyMongo with the multiprocessing library, make sure you set your start method to spawn . It will cost you a little performance in the creation of a process but it will create a more robust separation in this instance.

UserWarning: MongoClient opened before fork. Create MongoClient only after forking

Question

1 answers

solution1
0 2020-12-11 17:37:51

UserWarning: MongoClient opened before fork. Create MongoClient only after forking

Question

1 answers

solution1 0 2020-12-11 17:37:51

solution1
0 2020-12-11 17:37:51