简体   繁体   English

多处理中的SystemError

[英]SystemError in Multiprocessing

I am using multiprocessing to execute a function for iterative arguments. 我正在使用多重处理来执行迭代参数的函数。 For too lengthy arrays in the argument I get the following error message: 对于参数中太长的数组,我会收到以下错误消息:

<multiprocessing.pool.Pool object at 0x545912490>
Exception in thread Thread-2:
Traceback (most recent call last):
  File   "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 342, in _handle_tasks
put(task)
SystemError: NULL result without error in PyObject_Call

I try to compare a sample of arrays to another sample of arrays. 我尝试将一个数组样本与另一个数组样本进行比较。 The code is: 代码是:

import numpy as np
import multiprocessing

def func(args):
    i=args[0]
    array=args[1]
    sample=args[2]
    for j in np.arange(len(sample)):
        temp=0
        for element in array:
            temp+=len(np.where(sample[j]==element)[0])
    if temp==len(array):
        print i,j       #indices of identical arrays in the samples
    return 0

def compare_samples(a,b):
    iter=len(a)
    pool=multiprocessing.Pool()
    iter_args=[]
    for i in range(0,iter):
        iter_args.append([i,a[i],b])
    print pool
    pool.map(func,iter_args)
    return 0

N=100000000       #error if this number is too large
sample1=np.random.random_integers(0,9,size=(N,10))
sample2=np.random.random_integers(0,9,size=(N,10))

compare_samples(sample1,sample2)

I found a similar question ( System error while running subprocesses using Multiprocessing ), but there the solution is only for a special case, I don't see how to apply it generally. 我发现了一个类似的问题( 使用“多处理”运行子流程时出现系统错误 ),但是该解决方案仅适用于特殊情况,我看不出如何一般地应用它。

Does anyone know how to fix the error? 有人知道如何解决该错误吗?

Unfortunately, if the answer that you already referenced doesn't work for you, I don't think you're going to find another workaround that allows you to use your current approach. 不幸的是,如果您已经引用的答案对您不起作用,那么我认为您不会找到其他解决方法来使用当前方法。 Here is a conversation between a couple of Python devs discussing size limitations of the multiprocessing library, and it doesn't seem like a good solution was reached. 这是几个Python开发人员之间的对话 ,讨论了多处理库的大小限制,似乎还没有找到一个好的解决方案。 It appears that you're butting up against size limitations that don't really have a "fix" without modifying Python itself. 似乎您在不修改Python本身的情况下遇到了实际上没有“修复”的大小限制。

Even if that's changed by now, it (most likely) won't be backported to Python 2.7. 即使现在已更改,它(很可能)也不会反向移植到Python 2.7。

I think you have a few potential solutions: 我认为您有一些潜在的解决方案:

  1. Split up your return data in your worker functions and return them in pieces over a queue . 在工作函数中拆分返回数据,并在队列中将它们分段返回。 You may need to use a function more like apply_async instead map . 您可能需要使用更像apply_async的函数来代替map This will allow you to control the size of your data you're trying to push between processes at any one time. 这样一来,您就可以随时控制要在进程之间推送的数据大小。

  2. Use the mmap library and write your results into some shared memory that was set up my the main process. 使用mmap库并将结果写入到在主进程中设置的共享内存中。

  3. If you want something that's as simple as can be and aren't too worried about speed, you could simply write your results out to a text file, return the file name, and read it back in, in your main process. 如果您想要的是尽可能简单但又不太担心速度的内容,则可以在主过程中简单地将结果写到文本文件中,返回文件名,然后再读回。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM