简体   繁体   English

Python多处理pool.map引发IndexError

[英]Python multiprocessing pool.map raises IndexError

I've developed a utility using python/cython that sorts CSV files and generates stats for a client, but invoking pool.map seems to raise an exception before my mapped function has a chance to execute. 我已经使用python / cython开发了一个实用程序来对CSV文件进行排序并为客户端生成统计信息,但调用pool.map似乎会在我的映射函数有机会执行之前引发异常。 Sorting a small number of files seems to function as expected, but as the number of files grows to say 10, I get the below IndexError after calling pool.map. 排序少量文件似乎按预期运行,但随着文件数量增加到10,我在调用pool.map后得到以下IndexError。 Does anyone happen to recognize the below error? 有没有人碰巧认出以下错误? Any help is greatly appreciated. 任何帮助是极大的赞赏。

While the code is under NDA, the use-case is fairly simple: 虽然代码在NDA下,但用例非常简单:

Code Sample: 代码示例:

def sort_files(csv_files):
    pool_size = multiprocessing.cpu_count()
    pool = multiprocessing.Pool(processes=pool_size)
    sorted_dicts = pool.map(sort_file, csv_files, 1)
    return sorted_dicts

def sort_file(csv_file):
    print 'sorting %s...' % csv_file
    # sort code

Output: 输出:

File "generic.pyx", line 17, in generic.sort_files (/users/cyounker/.pyxbld/temp.linux-x86_64-2.7/pyrex/generic.c:1723)
    sorted_dicts = pool.map(sort_file, csv_files, 1)
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 227, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 528, in get
    raise self._value
IndexError: list index out of range

The IndexError is an error you get somewhere in sort_file(), ie in a subprocess. IndexError是您在sort_file()中的某个地方出现的错误,即在子进程中。 It is re-raised by the parent process. 它由父进程重新引发。 Apparently multiprocessing doesn't make any attempt to inform us about where the error really comes from (eg on which lines it occurred) or even just what argument to sort_file() caused it. 显然, multiprocessing不会尝试通知我们错误的真正来源(例如,它出现在哪一行上),甚至是sort_file()的哪个参数引起它。 I hate multiprocessing even more :-( 我讨厌multiprocessing :-(

Check further up in the command output. 在命令输出中进一步检查。 In Python 3.4 at least, multiprocessing.pool will helpfully print a RemoteTraceback above the parent process traceback. 至少在Python 3.4中, multiprocessing.pool将在父进程回溯之上RemoteTraceback打印RemoteTraceback You'll see something like: 你会看到类似的东西:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.4/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.4/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/path/to/your/code/here.py", line 80, in sort_file
    something = row[index]
IndexError: list index out of range
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "generic.pyx", line 17, in generic.sort_files (/users/cyounker/.pyxbld/temp.linux-x86_64-2.7/pyrex/generic.c:1723)
    sorted_dicts = pool.map(sort_file, csv_files, 1)
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 227, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 528, in get
    raise self._value
IndexError: list index out of range

In the case above, the code raising the error is at /path/to/your/code/here.py", line 80 在上面的例子中,引发错误的代码位于/path/to/your/code/here.py", line 80

see also debugging errors in python multiprocessing 另请参阅python多处理中的调试错误

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM