简体   繁体   English

Python 多处理和 sys.argv

[英]Python multiprocessing and sys.argv

Are sys.argv values passed to the branches of multiprocessing? sys.argv值是否传递到多处理的分支? What is the correct way of passing argv to all branches of the multiprocess?将 argv 传递给多进程的所有分支的正确方法是什么?

Let's suppose I have two files: test1.py:假设我有两个文件:test1.py:

import sys
if len(sys.argv) > 1:
    env = sys.argv[1]
else:
    env = 'test'

And main_code.py:和 main_code.py:

from test1 import *
import concurrent.futures


def f():
    if env == 'test':
        print('bu')
    else:
        print('not bu')

if __name__ == '__main__':
    with concurrent.futures.ProcessPoolExecutor(max_workers=2) as executor:
        for i in range(2):
            executor.submit(f)

I invoke from cmd main.code.py: python main_code.py zzz .我从 cmd main.code.py: python main_code.py zzz调用。 Is the sys.argv[1] variable (which is 'zzz') passed on each invocation of executor.submit(f) as it was first obtained from import of text1.py? sys.argv[1] 变量(即“zzz”)是否在每次调用executor.submit(f)时传递,因为它是从 text1.py 的导入中首次获得的? My confusion comes from the fact that concurrent.futures basically creates separate instances of threads of code by re-importing all the files.我的困惑来自这样一个事实,即 concurrent.futures 基本上通过重新导入所有文件来创建单独的代码线程实例。

On Windows, the spawn context is the only way to create worker processes.在 Windows 上, 生成上下文是创建工作进程的唯一方法。

  1. sys.argv are copied to worker processes once. sys.argv被复制到工作进程一次。

  2. Not all files are re-imported.并非所有文件都重新导入。 Only the modules are required to unpickle the task function and arguments are imported.只需要模块来解开任务function和 arguments 被导入。

  3. In the worker, the original __main__ is actually called __mp_main__ .在worker中,原来的__main__实际上叫做__mp_main__ After copying sys.argv , the worker import __mp_main__ , which import test , so env is set correctly.复制sys.argv后,worker 导入__mp_main__ ,其中导入test ,因此env设置正确。

  4. Though multiprocessing try to keep the environment similar, the worker process entry point is somewhere inside multiprocessing.spawn .尽管multiprocessing试图保持环境相似,但工作进程入口点位于multiprocessing.spawn中的某个位置。 Several items are mentioned there: sys.argv , sys.path , os.getcwd() .那里提到了几个项目: sys.argvsys.pathos.getcwd() See get_preparation_data() and prepare() for details.有关详细信息,请参阅get_preparation_data()prepare()

  5. It can be verified with Task Manager or ps command that the worker process is started with different arguments.可以通过任务管理器或ps命令验证worker进程是以不同的arguments启动的。

I wrote a simple script called mp.py to print the arguments by running python3 mp.py hello world .我编写了一个名为mp.py的简单脚本,通过运行python3 mp.py hello world来打印 arguments。

Output: Output:

29836 process ['C:/xxxx/stackoverflow/mp.py'] <module '__main__' from 'C:/xxxx/stackoverflow/mp.py'>
29836 my name is main
29836 true main <module '__main__' from 'C:/xxxx/stackoverflow/mp.py'>
18464 process ['C:\\xxxx\\stackoverflow\\mp.py'] <module '__main__' (built-in)>
18464 worker <module '__mp_main__' from 'C:\\xxxx\\stackoverflow\\mp.py'>

mp.py: mp.py:

from __future__ import annotations

import multiprocessing
import os
import sys
import time
from concurrent.futures import ProcessPoolExecutor


def list_modules(who_am_i):
    the_main = sys.modules.get('__main__')
    print(os.getpid(), who_am_i, the_main)


def main():
    list_modules('true main')
    mp_context = multiprocessing.get_context('spawn')
    # mp_context = multiprocessing.get_context('fork')
    # mp_context = multiprocessing.get_context('forkserver')
    with ProcessPoolExecutor(1, mp_context=mp_context) as executor:
        executor.submit(list_modules, 'worker').result()

        time.sleep(100)


# This message is print when this module is loaded. (none in fork, once in forkserver, multiple times in spawn)
print(os.getpid(), "process", sys.argv, sys.modules.get('__main__'))

if __name__ == '__main__':
    # Print once in the main process
    print(os.getpid(), "my name is main")
    main()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM