简体   繁体   English

python多处理共享文件在内存中

[英]python multiprocessing sharing file in memory

I'm implementing a multiprocessing program in python, and for each of the subprocess, they all need to read part of a file. 我正在python中实现一个多处理程序,对于每个子进程,它们都需要读取文件的一部分。

Since reading the file from the disk is expensive, I want to read it only once and put in shared memory. 由于从磁盘读取文件很昂贵,我只想读一次并放入共享内存。

1. If I use mmap, it can work with fork, but I can't find a way to share the mmaped file between Processes in the multiprocessing module. 1.如果我使用mmap,它可以使用fork,但我找不到在多处理模块中的进程之间共享mmaped文件的方法。

2. If I read in the file into a str, and store the string in sharedctypes.RawArray('c', str), an error can occur if the there is a \\0 in the str, the RawArray generated is a truncate of the file. 2.如果我将文件读入str,并将该字符串存储在sharedctypes.RawArray('c',str)中,如果str中存在\\ 0,则会发生错误,生成的RawArray是截断的文件。

Any idea? 任何想法?

Could you use the multiprocessing Managers? 你能使用多处理管理器吗? Make the mmped file an attribute of the NameSpace Object returned by the Namespace() function and pass a reference of this to each of the processes. 使mmped文件成为Namespace()函数返回的NameSpace对象的属性,并将其引用传递给每个进程。

from multiprocessing import Manager

mgr = Manager()
ns = mgr.Namespace()
ns.df = my_dataframe

# now just give your processes access to ns, i.e. most simply
# p = Process(target=worker, args=(ns, work_unit))

(my answer is basically copied from here ) (我的答案基本上是从这里复制的)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM