简体   繁体   中英

How to optimize imports in parallel python processes

Let us say that I have a python program that processes text json. Using multiprocessing.Pool , I'm going to chew through hundreds of thousands of json files; this will usually take several days.

I have two scripts. There is a master.py script that spawns processes. The spawned processes run an outside program and pipe the result to another python script, via subprocess.run .

otherProgram {args} | pipe.py {more args}

The master.py does not directly spawn the pipe.py process; that is done by an OS call, so the things that I import into master.py' are not shared with the imports required by pipe.py`.

On a 48 node machine, I have run up to 44 threads in parallel (ie multiprocessing.Pool(44, maxtaskperchild = 10) ).

So I have two related questions regarding the import process:

  • Is there any way that importing mostly standard libraries, and one custom module with a few helper functions, is going to negatively impact performance, in a way that I might notice?

  • If so, is there a way to optimize imports for the pipe.py process? Instead of importing os and binascii and tarfile hundreds of thousands of times, is there some way to make the same set of imports available to multiple processes?

I tested this with an executable script ( import_stuff.py ) that had the same imports as the pipe.py in question.

import subprocess, multiprocessing, datetime

def proc():
   subprocess.run(['/path/to/import_stuff.py', shell = True])
then = datetime.datetime.now()

for _ in range(1000):
   p.appy_async(proc)

p.close()
p.join()

delta = datetime.datetime.now() - then
print(delta)
  • For 1000 iterations: 0:00:00.665622
  • For 10000 iterations: 0:00:05.981406
  • For 100000 iteration: 0:01:10.125345

Since 100,000 iterations of my program takes hours to run, the overhead of repeatedly importing things is not very significant.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM