简体   繁体   English

Python 上的 `concurrent.futures.ProcessPoolExecutor` 从文件开头而不是定义的函数运行

[英]`concurrent.futures.ProcessPoolExecutor` on Python is ran from beginning of file instead of the defined function

I have a trouble with concurrent.futures .我对concurrent.futures有疑问。 For the short background, I was trying to do a massive image manipulation with python-opencv2 .对于简短的背景,我试图用python-opencv2进行大规模的图像处理。 I stumbled upon performance issue, which is a pain considering it can take hours to process only hundreds of image.我偶然发现了性能问题,考虑到处理数百张图像可能需要数小时,这很痛苦。 I found a solution by using concurrent.futures to utilize CPU multicores to make the process go faster (because I noticed while it took really long time to process, it only use like 16% of my 6-core processor, which is roughly a single-core).我找到了一个解决方案,通过使用concurrent.futures来利用 CPU 多核来加快处理速度(因为我注意到虽然处理时间很长,但它只使用了我的 6 核处理器的 16%,这大约是一个-核)。 So I created the code but then I noticed that the multiprocessing actually start from the beginning of the code instead of isolated around the function I just created.所以我创建了代码,但后来我注意到多处理实际上是从代码的开头开始的,而不是围绕我刚刚创建的函数进行隔离。 Here's the minimal working reproduction of the error:这是错误的最小工作再现:

import glob
import concurrent.futures
import cv2
import os

def convert_this(filename):
    ### Read in the image data
    img = cv2.imread(filename)
    
    ### Resize the image
    res = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    res.save("output/"+filename)

try:
    #create output dir
    os.mkdir("output")
    with concurrent.futures.ProcessPoolExecutor() as executor:
        files = glob.glob("../project/temp/")
        executor.map(convert_this, files)
except Exception as e:
    print("Encountered Error!")
    print(e)
    filelist = glob.glob("output")
    for f in filelist:
        os.remove(f)
    os.rmdir("output")

It gave me an error:它给了我一个错误:

Encountered Error!
Encountered Error!
[WinError 183] Cannot create a file when that file already exists: 'output'
Traceback (most recent call last):
  File "M:\pythonproject\testfolder\test.py", line 17, in <module>
    os.mkdir("output")
[WinError 183] Cannot create a file when that file already exists: 'output'
Encountered Error!
[WinError 183] Cannot create a file when that file already exists: 'output'
Traceback (most recent call last):
  File "M:\pythonproject\testfolder\test.py", line 17, in <module>
    os.mkdir("output")
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'output'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\<username>\Anaconda3\envs\py37\lib\multiprocessing\spawn.py", line 105, in spawn_main
Encountered Error!
[WinError 183] Cannot create a file when that file already exists: 'output'
Traceback (most recent call last):
  File "M:\pythonproject\testfolder\test.py", line 17, in <module>
    os.mkdir("output")
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'output'
...
(it was repeating errors of the same "can't create file")

As you see, the os.mkdir was ran even though it's outside of the convert_this function I just defined.如您所见, os.mkdir已运行,即使它位于我刚刚定义的convert_this函数之外。 I'm not that new to Python but definitely new in multiprocessing and threading.我对 Python 并不陌生,但在多处理和线程方面绝对是新手。 Is this just how concurrent.futures behaves?这就是concurrent.futures的行为方式吗? Or am I missing some documentation reading?还是我错过了一些文档阅读?

Thanks.谢谢。

Yes, multiprocessing must load the file in the new processes before it can run the function (just as it does when you run the file yourself), so it runs all code you have written.是的,多处理必须先将文件加载到新进程中,然后才能运行该函数(就像您自己运行文件时一样),因此它会运行您编写的所有代码。 So, either (1) move your multiprocessing code to a separate file with nothing extra in it and call that, or (2) enclose your top level code in a function (eg, main() ), and at the bottom of your file write因此,要么(1)将您的多处理代码移动到一个单独的文件中,其中没有任何额外内容并调用它,或者(2)将您的顶级代码包含在一个函数(例如main() )中,并在您的文件底部写

If __name__ == ”__main__":
    main()

This code will only be run when you start the script, but not by the multiprocess-spawned version.此代码只会在您启动脚本时运行,而不是由多进程生成的版本运行。 See Python docs for details on this construction.有关此构造的详细信息,请参阅Python 文档

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将带有对象的 function 传递到 concurrent.futures.ProcessPoolExecutor()? - Pass function with objects into concurrent.futures.ProcessPoolExecutor()? concurrent.futures.ProcessPoolExecutor() 中的共享变量 python - Shared variable in concurrent.futures.ProcessPoolExecutor() python 如何在 concurrent.futures.ProcessPoolExecutor 期间将变量 output 存储在 function 中 - How to store the variables output inside a function during concurrent.futures.ProcessPoolExecutor from concurrent.futures 当 function 是 lambda 或嵌套的 function 时,concurrent.futures.ProcessPoolExecutor 挂起 - concurrent.futures.ProcessPoolExecutor hangs when the function is a lambda or nested function 如何在 Python 中的 concurrent.futures.ProcessPoolExecutor 中传递“锁”? - How to pass the “lock” in my concurrent.futures.ProcessPoolExecutor in Python? python concurrent.futures.ProcessPoolExecutor 因 RAM 满而崩溃 - python concurrent.futures.ProcessPoolExecutor crashing with full RAM python concurrent.futures.ProcessPoolExecutor:.submit()vs .map()的性能 - python concurrent.futures.ProcessPoolExecutor: Performance of .submit() vs .map() Python concurrent.futures.ProcessPoolExecutor:大量 RAM 用于大量任务 - Python concurrent.futures.ProcessPoolExecutor: Lot of RAM for large number of tasks 如何将多个参数传递给由 concurrent.futures.ProcessPoolExecutor 中的 executor.map() 迭代的函数 - How to pass several parameters to a function which is iterated by executor.map() from concurrent.futures.ProcessPoolExecutor 如何监控python的concurrent.futures.ProcessPoolExecutor? - How to monitor python's concurrent.futures.ProcessPoolExecutor?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM