简体   繁体   English

Python 在尝试访问共享内存时死亡

[英]Python dies when attempting to access shared memory

I'm working on a piece of code that takes as input some block of shared memory (created using SharedMemory.shared_memory in Python3.8) that contains an ordered list of numbers, as well as a numpy array, with the goal being some final array in shared memory that is the ordered union of these two sets.我正在编写一段代码,它将一些共享内存块(在 Python3.8 中使用 SharedMemory.shared_memory 创建)作为输入,其中包含一个有序的数字列表以及一个 numpy 数组,目标是一些最终的共享内存中的数组是这两个集合的有序联合。

The blocks of shared memory are quite large (~8 Gb up to the point where it fails), so to speed up the union the input block of shared memory is divided evenly between some processes, which then individually perform the union on their little chunk, with the result being combined in another (separate) block of shared memory.共享内存块非常大(大约 8 Gb 到它失败的地方),所以为了加速联合,共享内存的输入块在一些进程之间平均分配,然后分别在他们的小块上执行联合,结果被合并到另一个(单独的)共享内存块中。 These unions are performed multiple times, so in practice many blocks of shared memory of increasing size are allocated in the lifetime of the program, however as soon as the union has been completed, the old block of shared memory is unlinked (so at any given time, there are up to two active blocks of shared memory, each with a size ~8Gb at the point of death).这些联合被执行多次,因此实际上在程序的生命周期中分配了许多大小增加的共享内存块,但是一旦联合完成,旧的共享内存块就被取消链接(所以在任何给定的时间,最多有两个活动的共享内存块,每个块在死亡时的大小约为 8Gb)。

The problem I'm experiencing is that upon a call to access the shared memory block, the process dies without raising any errors.我遇到的问题是,在调用访问共享内存块时,进程终止而不会引发任何错误。 The work is being done in a child process (I'll call it the worker), spawned from another child process that was created from the main process.这项工作是在一个子进程中完成的(我称之为工作进程),它是从另一个从主进程创建的子进程产生的。 The worker process doesn't appear to raise any errors, and I've tried using other methods to catch any errors such as using a try:... except Exception as e:... in the worker and then communicating error information back to the main process through a Pipe, but as far as I can tell no errors are being raised and the worker seems to die silently.工作进程似乎没有引发任何错误,我已经尝试使用其他方法来捕获任何错误,例如使用try:... except Exception as e:...在工作进程中,然后将错误信息传达回通过管道进入主进程,但据我所知,没有出现错误,工人似乎默默地死去。 Specifically, it dies on the following line in the worker process:具体来说,它死在工作进程中的以下行:

shm_block = mp.shared_memory.SharedMemory(name=shm_key)

I've been running the code on a Linux server with 64Gb of RAM, and a call to df –k /dev/shm suggests that I have 32Gb in shared memory.我一直在具有 64Gb RAM 的 Linux 服务器上运行代码,调用df –k /dev/shm表明我在共享内存中有 32Gb。 I've been running the program with 8 worker processes (but it fails with fewer processes as well, like 2 and 4 workers), and it seems to run smoothly up to this threshold just above 8Gb where the workers silently die.我一直在使用 8 个工作进程运行该程序(但它也因较少的进程而失败,例如 2 和 4 个工作人员),并且它似乎平稳地运行到略高于 8Gb 的阈值,在该阈值上,工作人员默默地死去。 I've tried to create a minimum reproducible example using the same dataset on a much smaller machine (10Gb RAM) but in this case a MemoryError is raised.我试图在一台更小的机器(10Gb RAM)上使用相同的数据集创建一个最小的可重现示例,但在这种情况下会引发 MemoryError。 I've tried looking into shm_open and mmap which lie underneath the SharedMemory module to see if there is a limit on the size of shared memory block, but I haven't come across anything.我尝试查看位于 SharedMemory 模块下方的shm_openmmap以查看共享内存块的大小是否有限制,但我没有遇到任何问题。

Any advice would be greatly appreciated, and happy to provide any more of the code if that would be helpful.任何建议将不胜感激,如果有帮助,很乐意提供更多代码。 Thanks in advance!提前致谢!

I came accross the same problem.我遇到了同样的问题。 For me, the problem was an Access Violation, that Python does not throw.对我来说,问题是访问冲突,Python 不会抛出。

from multiprocessing import shared_memory
shm_list = shared_memory.ShareableList([0] * 29, name='shm_xyz')
b = 10 ** 1000
try:
    shm_list[4] = b
except Exception as e:
    print(e)
print('I arrived at that point')

Usually, this should throw an argument out of range error.通常,这应该抛出参数超出范围错误。 In my project I am doing pretty much the same, but for some reason (which I was not able to reproduce), it does not throw the exception, but simply crashes instead.在我的项目中,我正在做几乎相同的事情,但由于某种原因(我无法重现),它不会抛出异常,而只是崩溃。 When I called the python script from c++ I noticed, that an AccessViolation is thrown.当我从 C++ 调用 python 脚本时,我注意到,抛出了一个 AccessViolation。 This AccessViolation is thrown by Windows or on a Hardware level as far as I understood, and therefor immediately crashes python.据我所知,此 AccessViolation 是由 Windows 或硬件级别引发的,因此立即使 python 崩溃。

The solution for me was to avoid using numbers which are too big.我的解决方案是避免使用太大的数字。

Sorry if I wasn´t of much help, thought I´d still share it in case someone finds this question while searching for the problem like I did.对不起,如果我没有多大帮助,我想我仍然会分享它,以防有人在像我一样搜索问题时发现这个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM