简体   繁体   English

具有多处理功能的共享内存字符串数组

[英]Shared Memory Array of Strings with Multiprocessing

I'm trying to multiprocess some existing code and I'm finding that the pickling/unpickling of data to the processes is too slow with a Pool .我正在尝试对一些现有代码进行多处理,我发现使用Pool对进程的数据进行酸洗/取消酸洗太慢了。 I think for my situation a Manager will suffer the same issues since it does the same pickling behind the scenes.我认为对于我的情况, Manager会遇到同样的问题,因为它在幕后进行了相同的酸洗。

To solve the issue I'm trying to move to a shared memory array.为了解决这个问题,我试图转移到共享内存阵列。 For this to work, I need an array of strings.为此,我需要一个字符串数组。 It seems that multiprocessing.Array supports a ctypes.c_char_p but I'm having difficulty extending this into an array of strings.似乎multiprocessing.Array支持ctypes.c_char_p但我很难将其扩展到字符串数组中。 Below is a few of the many things I've tried.以下是我尝试过的许多事情中的一些。

#!/usr/bin/python
import ctypes
import multiprocessing as mp
import multiprocessing.sharedctypes as mpsc
import numpy

# Tested possible solutions
ver = 1
if 1==ver:
    strings = mpsc.RawArray(ctypes.c_char_p, (' '*10, ' '*10, ' '*10, ' '*10))
elif 2==ver:
    tmp_strings = [mpsc.RawValue(ctypes.c_char_p, ' '*10) for i in xrange(4)]
    strings = mpsc.RawArray(ctypes.c_char_p, tmp_strings)
elif 3==ver:
    strings = []
    for i in xrange(4):
        strings.append( mpsc.RawValue(ctypes.c_char_p, 10) )

def worker(args):
    snum, lenarg = args
    string = '%s' % snum
    string *= lenarg
    strings[snum] = string
    return string

# Main progam
data = [(i, numpy.random.randint(1,10)) for i in xrange(3)]
print 'Testing version ', ver
print
print 'Single process'
for x in map(worker, data):
    print '%10s : %s' % (x, list(strings))
print

print 'Multi-process'
pool = mp.Pool(3)
for x in pool.map(worker, data):
    print '%10s : %s' % (x, list(strings))
    print '            ', [isinstance(s, str) for s in strings]

Note that I'm using the multiprocessing.sharedctypes because I don't need locking and it should be fairly interchangeable with multiprocessing.Array请注意,我使用multiprocessing.sharedctypes因为我不需要锁定,它应该与multiprocessing.Array相当可互换

The issue with the above code is that the resultant strings object contains regular strings, not shared memory strings coming out of the mpsc.RawArray constructor.上面代码的问题在于结果strings对象包含常规字符串,而不是来自mpsc.RawArray构造函数的共享内存字符串。 With version 1 and 2 you can see how the data gets scrambled when working out of process (as expected).使用版本 1 和 2,您可以看到在进程外工作时数据是如何被打乱的(如预期的那样)。 For me, version 3 looked like it worked initially but you can see the = is just setting the object to a regular string and while this works for the short test, in the larger program it creates issues.对我来说,版本 3 最初看起来像它,但你可以看到=只是将对象设置为常规字符串,虽然这适用于短期测试,但在较大的程序中它会产生问题。

It seems like there should be a way to create a shared array of pointers where the pointers point strings in shared memory space.似乎应该有一种方法来创建一个共享指针数组,其中指针指向共享内存空间中的字符串。 The c_void_p type complains if you try to initialize it with a c_str_p type and I haven't had any luck manipulating the underlying address pointers directly yet.如果您尝试使用c_str_p类型初始化c_str_p类型,并且我还没有运气直接操作底层地址指针,则c_void_p类型会抱怨。

Any help would be appreciated.任何帮助,将不胜感激。

First, your third solution doesn't work as strings isn't changed by multiprocessing part but has been modified by single process part.首先,您的第三个解决方案不起作用,因为strings没有被多处理部分改变,但已被单处理部分修改。 You can have a check by commenting your single process part.您可以通过评论您的单个流程部分来进行检查。

Second, This one will work:其次,这将起作用:

import ctypes
import multiprocessing as mp
import multiprocessing.sharedctypes as mpsc
import numpy

strings = [mpsc.RawArray(ctypes.c_char, 10) for _ in xrange(4)]

def worker(args):
    snum, lenarg = args
    string = '%s' % snum
    string *= lenarg
    strings[snum].value = string
    return string

# Main progam
data = [(i, numpy.random.randint(1,10)) for i in xrange(4)]

print 'Multi-process'
print "Before: %s" % [item.value for item in strings]
pool = mp.Pool(4)
pool.map(worker, data)
print 'After : %s' % [item.value for item in strings]

output:输出:

Multi-process
Before: ['', '', '', '']
After : ['0000000', '111111', '222', '3333']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM