Memory 在 Jpype 中通过多处理泄漏

Question

I have a python code that uses a java library by means of jpype.我有一个 python 代码，它通过 jpype 使用 java 库。 Currently, each run of my function checks if JVM exists, and creates it if it is not the case目前，我的 function 的每次运行都会检查 JVM 是否存在，如果不存在则创建它

import jpype as jp

def myfunc(i):
  if not jp.isJVMStarted():
    jp.startJVM(jp.getDefaultJVMPath(), '-ea', ('-Djava.class.path=' + jar_location))
  do_something_hard(i)

Further, I want to parallelize my code using python multiprocessing library.此外，我想使用 python 多处理库并行化我的代码。 Each thread (supposedly) works independently, calculating value of my function with different parameters.每个线程（据说）独立工作，用不同的参数计算我的 function 的值。 For example例如

import pathos

pool = pathos.multiprocessing.ProcessingPool(8)
params = np.arange(100)
result = pool.map(myfunc, params)

This construction works fine, except it has dramatic memory leaks when using more than 1 core in the pool.这种结构工作正常，但在池中使用超过 1 个核心时，它会出现严重的 memory 泄漏。 I notice that all memory is free up when python is closed, but memory still accumulates over time while pool.map is running, which is undesirable.我注意到，当 python 关闭时，所有 memory 都已释放，但 memory 仍会随着时间的推移而累积，而pool.map 。 The jpype documentation is incredibly brief, suggesting to synchronize threads by wrapping python threads with jp.attachThreadToJVM and jp.detachThreadToJVM . jpype 文档非常简短，建议通过使用jp.attachThreadToJVM和jp.detachThreadToJVM包装 python 线程来同步线程。 However, I cannot find a single example online on how to actually do it.但是，我无法在网上找到一个关于如何实际操作的示例。 I have tried wrapping the function do_something_hard inside myfunc with these statements, but it had no effect on the leak.我尝试使用这些语句将 function do_something_hard包装在myfunc中，但它对泄漏没有影响。 I had also attempted to explicitly close JVM at the end of myfunc using jp.shutdownJVM .我还尝试使用 jp.shutdownJVM 在myfunc末尾显式关闭jp.shutdownJVM 。 However, in this case JVM seems to crash as soon as I have more than 1 core, leading me to believe that there is a race condition.但是，在这种情况下，JVM 似乎在我拥有超过 1 个内核时立即崩溃，这让我相信存在竞争条件。

Please help:请帮忙：

What is going on?到底是怎么回事？ Why would there be a race condition?为什么会有竞态条件？ Is it not the case, that each thread makes its own JVM?不是这样吗，每个线程都有自己的 JVM？
What is the correct way to free up memory in my scenario?在我的场景中释放 memory 的正确方法是什么？

Answer 1

The problem is with the nature of multiprocessing.问题在于多处理的性质。 Python can either fork or spawn a new process. Python 可以分叉或生成一个新进程。 The fork option appears to have significant problems with the JVM.对于 JVM，fork 选项似乎存在重大问题。 The default on linux is fork. linux 上的默认值为 fork。

Using the spawn context (multiprocessing.get_context("spawn")) to create a spawned version of Python will allow a fresh JVM to be created.使用生成上下文 (multiprocessing.get_context("spawn")) 创建 Python 的生成版本将允许创建新的 JVM。 Each spawned copy is completely independent.每个生成的副本都是完全独立的。 There are examples in the subrun.py in the test directory on github as that is what is used to test different JVM options for JPype.在 github 上的测试目录中的 subrun.py 中有示例，因为它用于测试 JPype 的不同 JVM 选项。

The fork version creates a copy of the original process including the previous running JVM. fork 版本创建原始进程的副本，包括先前运行的 JVM。 At least from my testing the forked JVM does not work as expected.至少从我的测试来看，分叉的 JVM 没有按预期工作。 Older versions of JPype (0.6.x) would allow the forked version to call startJVM which would create a big memory leak.较早版本的 JPype (0.6.x) 将允许分叉版本调用 startJVM，这会造成大的 memory 泄漏。 The current version 0.7.1 gives and exception that the JVM cannot be restarted.当前版本 0.7.1 给出了 JVM 无法重新启动的例外情况。

If you are using threads (rather than processes), all threads share the same JVM and do not need to the JVM independently.如果您使用线程（而不是进程），则所有线程共享相同的 JVM，并且不需要单独使用 JVM。 There is further documentation on the use of multiprocessing with JPype in the latest documentation on github under the "limitations" section.在 github 的“限制”部分下的最新文档中，有更多关于使用 JPype 进行多处理的文档。

Memory 在 Jpype 中通过多处理泄漏

问题描述

1 个解决方案

解决方案1
2 2020-01-20 02:30:02

Memory 在 Jpype 中通过多处理泄漏

问题描述

1 个解决方案

解决方案1 2 2020-01-20 02:30:02

解决方案1
2 2020-01-20 02:30:02