简体   繁体   English

使用sys.stdout.write和pool.map进行多处理 - 共享sys.stdout

[英]Using sys.stdout.write with pool.map for multiprocessing - sharing sys.stdout

This is probably something very simple I'm missing. 这可能是我很遗憾的事情。

Why can't I use pool.map(sys.stdout.write, iterable) ? 为什么我不能使用pool.map(sys.stdout.write, iterable)

I can use pool.map(len, iterable) using the same iterable but when using sys.stdout.write I get the following exception: 我可以使用相同的iterable使用pool.map(len, iterable) 但是当使用sys.stdout.write我得到以下异常:

TypeError: expected string or Unicode object, NoneType found

This is the trace: 这是跟踪:

Traceback (most recent call last):
  File "/home/reut/python/print_mult.py", line 19, in <module>
    pool.map(sys.stdout.write, messages)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
TypeError: expected string or Unicode object, NoneType found

Full code: 完整代码:

#!/usr/bin/env python

import multiprocessing
import sys

# pool of 10 workers
pool = multiprocessing.Pool(10)
messages = ["message #%d\n" % i for i in range(100)]
print messages
pool.map(sys.stdout.write, messages) # doesn't work - error
# print pool.map(len, messages) # works

Edit #1 - ThreadPool works: 编辑#1 - ThreadPool工作原理:

When I use ThreadPool (from multiprocessing.pool ) it works, so I suppose it has something to do with not being able to share the sys.stdout stream across processes. 当我使用ThreadPool (来自multiprocessing.pool )它可以工作,所以我认为它与无法跨进程共享sys.stdout流有关。

Edit #2 - manual processes works as well: 编辑#2 - 手动流程也可以:

from multiprocessing import Process
import sys

# pool of 10 workers
processes = []
for i in range(10):
    processes.append(Process(target=sys.stdout.write, args=("I am process %d" % i, )))

for p in processes:
    p.start()

for p in processes:
    p.join()

So now I'm confused because the difference I know between a regular process and a map process is the point it forks. 所以现在我很困惑,因为我知道常规流程和地图流程之间的区别就是它所要求的点。 I'm not sure how it's relevant here. 我不确定这里的相关性如何。 The only thing I can think of is that map stores the target internally and is unable to share it with the workers the way the manual constructor of Process does. 我唯一能想到的是map在内部存储target ,并且无法像Process的手动构造函数那样与worker共享它。

The real error is hidden. 真正的错误是隐藏的。 You can only pass a function that is directly referable from a module namespace. 您只能传递一个可直接从模块名称空间引用的函数。 However, in some circumstances there are ways to get around this limitation. 但是,在某些情况下,有办法解决这个限制。 Unix has a special feature whereby a process can be forked and all its memory duplicated. Unix有一个特殊的功能,可以分叉进程并复制其所有内存。 This is how instance methods can be 'passed' to a child process -- nothing is actually passed. 这就是实例方法可以“传递”到子进程的方式 - 实际上没有传递任何东西。 On the Windows platform processes cannot be forked, but must be spawned instead. 在Windows平台上,无法分叉进程,但必须生成进程。 This means a new interpreter is started. 这意味着启动了一个新的解释器。 For the interpreter to run the given function it is sent the name of the function to run and the module it is located in. The interpreter imports the module and looks up the function, before finally running the function. 为了使解释器能够运行给定的函数,它将发送要运行的函数的名称以及它所在的模块。解释器在最终运行函数之前导入模块并查找函数。

For a process that is part of a pool, the process has already been started and so it cannot benefit from forking to receive a copy of the appropriate function/method to run. 对于作为池的一部分的进程,该进程已经启动,因此无法从分叉接收要运行的相应函数/方法的副本中受益。 Instead it must use the same technique as when a new process is spawned. 相反,它必须使用与生成新进程时相同的技术。 This is why you can get your second edit to work, but not the pool to work. 这就是为什么你可以让你的第二个编辑工作,但不是你的工作池。

The easiest way to get around your problem is to make print a function rather than a statement. 解决问题的最简单方法是使print成为函数而不是语句。

from __future__ import print_function

import multiprocessing
import sys

if __name__ == '__main__':
    pool = multiprocessing.Pool(2)
    messages = ["message #%d\n" % i for i in range(5)]
    print(messages) # <- notice the brackets around the arguments to print
    pool.map(print, messages)

Failing that you can define a function that will do the printing for you, and use that as the function for map. 如果没有,您可以定义一个将为您打印的功能,并将其用作地图的功能。

import multiprocessing 
import sys

def stdout_write(arg):
    sys.stdout.write(arg)

def stdout_print(arg):
    print arg

if __name__ == '__main__':
    pool = multiprocessing.Pool(2)
    messages = ["message #%d\n" % i for i in range(5)]
    print messages
    pool.map(stdout_print, messages)

I'm not sure why, exactly, but pool.map() is requiring the function to return a string. 我不确定为什么,但是pool.map()要求函数返回一个字符串。

This simple change to your program runs correctly. 对程序的这种简单更改可以正确运行。

#!/usr/bin/env python

import multiprocessing
import sys

def prn(s):
    sys.stdout.write(s)
    return ''

# pool of 10 workers
pool = multiprocessing.Pool(10)
messages = ["message #%d\n" % i for i in range(100)]
print messages
pool.map(prn, messages) # doesn't work - error
# print pool.map(len, messages) # works

I checked the documentation and I don't see this requirement so I don't know why it is being enforced. 我检查了文档,但我没有看到这个要求,所以我不知道为什么要强制执行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM