[英]Shared Memory Array of Strings with Multiprocessing
I'm trying to multiprocess some existing code and I'm finding that the pickling/unpickling of data to the processes is too slow with a Pool
.我正在尝试对一些现有代码进行多处理,我发现使用Pool
对进程的数据进行酸洗/取消酸洗太慢了。 I think for my situation a Manager
will suffer the same issues since it does the same pickling behind the scenes.我认为对于我的情况, Manager
会遇到同样的问题,因为它在幕后进行了相同的酸洗。
To solve the issue I'm trying to move to a shared memory array.为了解决这个问题,我试图转移到共享内存阵列。 For this to work, I need an array of strings.为此,我需要一个字符串数组。 It seems that multiprocessing.Array
supports a ctypes.c_char_p
but I'm having difficulty extending this into an array of strings.似乎multiprocessing.Array
支持ctypes.c_char_p
但我很难将其扩展到字符串数组中。 Below is a few of the many things I've tried.以下是我尝试过的许多事情中的一些。
#!/usr/bin/python
import ctypes
import multiprocessing as mp
import multiprocessing.sharedctypes as mpsc
import numpy
# Tested possible solutions
ver = 1
if 1==ver:
strings = mpsc.RawArray(ctypes.c_char_p, (' '*10, ' '*10, ' '*10, ' '*10))
elif 2==ver:
tmp_strings = [mpsc.RawValue(ctypes.c_char_p, ' '*10) for i in xrange(4)]
strings = mpsc.RawArray(ctypes.c_char_p, tmp_strings)
elif 3==ver:
strings = []
for i in xrange(4):
strings.append( mpsc.RawValue(ctypes.c_char_p, 10) )
def worker(args):
snum, lenarg = args
string = '%s' % snum
string *= lenarg
strings[snum] = string
return string
# Main progam
data = [(i, numpy.random.randint(1,10)) for i in xrange(3)]
print 'Testing version ', ver
print
print 'Single process'
for x in map(worker, data):
print '%10s : %s' % (x, list(strings))
print
print 'Multi-process'
pool = mp.Pool(3)
for x in pool.map(worker, data):
print '%10s : %s' % (x, list(strings))
print ' ', [isinstance(s, str) for s in strings]
Note that I'm using the multiprocessing.sharedctypes
because I don't need locking and it should be fairly interchangeable with multiprocessing.Array
请注意,我使用multiprocessing.sharedctypes
因为我不需要锁定,它应该与multiprocessing.Array
相当可互换
The issue with the above code is that the resultant strings
object contains regular strings, not shared memory strings coming out of the mpsc.RawArray
constructor.上面代码的问题在于结果strings
对象包含常规字符串,而不是来自mpsc.RawArray
构造函数的共享内存字符串。 With version 1 and 2 you can see how the data gets scrambled when working out of process (as expected).使用版本 1 和 2,您可以看到在进程外工作时数据是如何被打乱的(如预期的那样)。 For me, version 3 looked like it worked initially but you can see the =
is just setting the object to a regular string and while this works for the short test, in the larger program it creates issues.对我来说,版本 3 最初看起来像它,但你可以看到=
只是将对象设置为常规字符串,虽然这适用于短期测试,但在较大的程序中它会产生问题。
It seems like there should be a way to create a shared array of pointers where the pointers point strings in shared memory space.似乎应该有一种方法来创建一个共享指针数组,其中指针指向共享内存空间中的字符串。 The c_void_p
type complains if you try to initialize it with a c_str_p
type and I haven't had any luck manipulating the underlying address pointers directly yet.如果您尝试使用c_str_p
类型初始化c_str_p
类型,并且我还没有运气直接操作底层地址指针,则c_void_p
类型会抱怨。
Any help would be appreciated.任何帮助,将不胜感激。
First, your third solution doesn't work as strings
isn't changed by multiprocessing part but has been modified by single process part.首先,您的第三个解决方案不起作用,因为strings
没有被多处理部分改变,但已被单处理部分修改。 You can have a check by commenting your single process part.您可以通过评论您的单个流程部分来进行检查。
Second, This one will work:其次,这将起作用:
import ctypes
import multiprocessing as mp
import multiprocessing.sharedctypes as mpsc
import numpy
strings = [mpsc.RawArray(ctypes.c_char, 10) for _ in xrange(4)]
def worker(args):
snum, lenarg = args
string = '%s' % snum
string *= lenarg
strings[snum].value = string
return string
# Main progam
data = [(i, numpy.random.randint(1,10)) for i in xrange(4)]
print 'Multi-process'
print "Before: %s" % [item.value for item in strings]
pool = mp.Pool(4)
pool.map(worker, data)
print 'After : %s' % [item.value for item in strings]
output:输出:
Multi-process
Before: ['', '', '', '']
After : ['0000000', '111111', '222', '3333']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.