如何在mpi4py中分散和收集对象的python列表

Question

I have a list of 100,000 python objects that I would like to scatter and gather in mpi4py . 我有一个100,000个python 对象的列表，我希望将它们分散在mpi4py 。

When I try with 8 processors I get: 当我尝试使用8个处理器时，我得到：

SystemError: Negative size passed to PyBytes_FromStringAndSize SystemError：负大小传递给PyBytes_FromStringAndSize

on the scattering. 在散射。

When I try with 64 processors I get the same error but on the gather. 当我尝试使用64个处理器时，我遇到了同样的错误，但是却越来越多。

When I try making an array of objects out of the list and use Gather and Scatter, I get an error which basically states the dtype of the array cannot be object. 当我尝试从列表中制作对象数组并使用Gather和Scatter时，出现一个错误，该错误基本上表明该数组的dtype不能是对象。

Any way I can get this to work? 有什么办法可以使它正常工作吗？ Or anything else I can use other than MPI? 还是我可以使用MPI以外的其他工具？
I'm running this on an 8-node, 64-ppn computer. 我正在8节点，64 ppn的计算机上运行此程序。

Answer 1

Using scatter and gather, an example of splitting a numpy array with 100000 items. 使用散点图和聚集，示例了一个将numpy数组拆分为100000项的示例。

import numpy as np
from mpi4py import MPI
from pprint import pprint
comm = MPI.COMM_WORLD

pprint("-" * 78)
pprint(" Running on %d cores" % comm.size)
pprint("-" * 78)

N = 100000
my_N = N // 8

if comm.rank == 0:
    A = np.arange(N, dtype=np.float64)
else:
    A = np.empty(N, dtype=np.float64)

my_A = np.empty(my_N, dtype=np.float64)

# Scatter data 
comm.Scatter([A, MPI.DOUBLE], [my_A, MPI.DOUBLE])

pprint("After Scatter:")
for r in range(comm.size):
    if comm.rank == r:
        print("[%d] %s" % (comm.rank, len(my_A)))
    comm.Barrier()

# Allgather data into A
comm.Allgather([my_A, MPI.DOUBLE], [A, MPI.DOUBLE])

pprint("After Allgather:")
for r in range(comm.size):
    if comm.rank == r:
        print("[%d] %s" % (comm.rank, len(A)))
    comm.Barrier()

Also you could check scatterv and gatherv , more examples here and here . 您也可以在这里和这里检查scatterv和gatherv ，更多示例。

Answer 2

I'm not sure this is the answer, and also I'm not sure you are still looking for the answer, but... 我不确定这是否是答案，也不确定您是否还在寻找答案，但是...

So you have 100,000 python objects . 因此，您有100,000个python 对象。 If these objects are regular data (data sets), not an instance of some class, pass data as json string. 如果这些对象是常规数据（数据集），而不是某个类的实例，则将数据作为json字符串传递。 Something like this: 像这样：

#!/usr/bin/env python

import json
import numpy as np
from mpi4py import MPI


comm = MPI.COMM_WORLD

if comm.rank == 0:
    tasks = [
        json.dumps( { 'a':1,'x':2,'b':3 } ),
        json.dumps( { 'a':3,'x':1,'b':2 } ),
        json.dumps( { 'a':2,'x':3,'b':1 } )
    ]
else:
    tasks = None


# Scatter paramters arrays
unit = comm.scatter(tasks, root=0)

p = json.loads(unit)
print "-"*18
print("-- I'm rank %d in %d size task" % (comm.rank,comm.size) )
print("-- My paramters are: {}".format(p))
print "-"*18

comm.Barrier()

calc = p['a']*p['x']**2+p['b']

# gather results
result = comm.gather(calc, root=0)
# do something with result

if comm.rank == 0:
    print "the result is ", result
else:
    result = None

note, that if you have only 8 nodes/cores, you have to create 8 records in the tasks list and sequentially scatter and gather all 100,000 data sets. 请注意，如果您只有8个节点/核心，则必须在tasks列表中创建8条记录，并依次分散和收集所有100,000个数据集。 If all your data set is in ALLDATA list, the code could look like this: 如果所有数据集都在ALLDATA列表中，则代码可能如下所示：

def calc(a=0,x=0,b=0):
    return a*x**2+b

if comm.rank == 0: collector = []
for xset in zip(*(iter(ALLDATA),) * comm.size):
    task = [ json.dumps(s) for s in xset ]
    comm.Barrier()
    unit = comm.scatter(task if comm.rank == 0 else None, root=0)
    p = json.loads(unit)
    res = json.dumps( calc(**p) )
    totres = comm.gather(res, root=0)
    if comm.rank == 0:
        collector += [ json.loads(x) for x in  totres  ]



if comm.rank == 0:
    print "the result is ", collector

如何在mpi4py中分散和收集对象的python列表

问题描述

2 个解决方案

解决方案1
2 2018-07-13 09:23:52

解决方案2
0 2018-09-09 22:56:58

如何在mpi4py中分散和收集对象的python列表

问题描述

2 个解决方案

解决方案1 2 2018-07-13 09:23:52

解决方案2 0 2018-09-09 22:56:58

解决方案1
2 2018-07-13 09:23:52

解决方案2
0 2018-09-09 22:56:58