如何在群集上的MPI应用程序中使用scipy.weave.inline？

Question

If scipy.weave.inline is called inside a massive parallel MPI-enabled application that is run on a cluster with a home-directory that is common to all nodes, every instance accesses the same catalog for compiled code: $HOME/.pythonxx_compiled. 如果在一个大型并行MPI启用的应用程序中调用scipy.weave.inline，该应用程序在具有所有节点通用的主目录的集群上运行，则每个实例都会访问编译代码的相同目录：$ HOME / .pythonxx_compiled。 This is bad for obvious reasons and leads to many error messages. 由于明显的原因这很糟糕，并导致许多错误消息。 How can this problem be circumvented? 如何规避这个问题呢？

Answer 1

As per the scipy docs , you could store your compiled data in a directory that isn't on the NFS share (such as /tmp or /scratch or whatever is available for your system). 根据scipy文档，您可以将编译的数据存储在不在NFS共享上的目录中（例如/ tmp或/ scratch或您的系统可用的任何内容）。 Then you wouldn't have to worry about your conflicts. 那你就不用担心你的冲突了。 You just need to set the PYTHONCOMPILED environment variable to something else. 您只需要将PYTHONCOMPILED环境变量设置为其他内容。

Answer 2

My previous thoughts about this problem: 我之前对此问题的看法：

Either scipy.weave.catalog has to be enhanced with a proper locking mechanism in order to serialize access to the catalog, or every instance has to use its own catalog. scipy.weave.catalog必须使用适当的锁定机制进行增强，以便序列化对目录的访问，或者每个实例都必须使用自己的目录。

I chose the latter. 我选择了后者。 The scipy.weave.inline function uses a catalog which is bound to the module-level name function_catalog of the scipy.weave.inline module. scipy.weave.inline函数使用一个目录，该目录绑定到scipy.weave.inline模块的模块级名称function_catalog 。 This can be discovered by looking into the code of this module ( https://github.com/scipy/scipy/tree/v0.12.0/scipy/weave ). 这可以通过查看该模块的代码（ https://github.com/scipy/scipy/tree/v0.12.0/scipy/weave ）来发现。

The simples solution is now to monkeypatch this name to something else at the beginning of the program: 简单的解决方案现在是在程序开始时将此名称单个化为其他名称：

from mpi4py import MPI

import numpy as np

import scipy.weave.inline_tools
import scipy.weave.catalog

import os
import os.path

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

catalog_dir = os.path.join(some_path,  'rank'+str(rank))
try:
    os.makedirs(catalog_dir)
except OSError:
    pass

#monkeypatching the catalog
scipy.weave.inline_tools.function_catalog = scipy.weave.catalog.catalog(catalog_dir)

Now inline works smoothly: Each instance has its own catalog inside the common NFS directory. 现在inline工作顺利：每个实例在公共NFS目录中都有自己的目录。 Of course this naming scheme breaks if two distinct parallel tasks ran at the same time, but this would also be the case if the catalog was in /tmp. 当然，如果两个不同的并行任务同时运行，则此命名方案会中断，但如果目录位于/ tmp中，也会出现这种情况。

Edit : As mentioned in a comment above this procedure still bears problems if multiple indepedent jobs are run in parallel. 编辑：如上面的评论中所述，如果多个独立作业并行运行，此过程仍然存在问题。 This can be remedied by adding a random uuid to the pathname: 这可以通过在路径名中添加随机uuid来解决：

import uuid

u = None
if rank == 0:
    u = str(uuid.uuid4())

u = comm.scatter([u]*size, root=0)

catalog_dir = os.path.join('/tmp/<username>/pythoncompiled',  u+'-'+str(rank))
os.makedirs(catalog_dir)

#monkeypatching the catalog
scipy.weave.inline_tools.function_catalog = scipy.weave.catalog.catalog(catalog_dir)

Of course it would be nice to delete those files after the computation: 当然，在计算之后删除这些文件会很好：

shutil.rmtree(catalog_dir)

Edit : There were some additional problems. 编辑：还有一些其他问题。 The intermediate directory where cpp and o files are stored also hat some trouble due to simultaneous access from different instances, so the above method has to be extended to this directory: 由于来自不同实例的同时访问，存储cpp和o文件的中间目录也存在一些问题，因此上述方法必须扩展到此目录：

basetmp = some_path
catalog_dir = os.path.join(basetmp, 'pythoncompiled',  u+'-'+str(rank))
intermediate_dir = os.path.join(basetmp, 'pythonintermediate',  u+'-'+str(rank))

os.makedirs(catalog_dir, mode=0o700)
os.makedirs(intermediate_dir, mode=0o700)

#monkeypatching the catalog and intermediate_dir
scipy.weave.inline_tools.function_catalog = scipy.weave.catalog.catalog(catalog_dir)
scipy.weave.catalog.intermediate_dir = lambda: intermediate_dir

#... calculations here ...

shutil.rmtree(catalog_dir)
shutil.rmtree(intermediate_dir)

Answer 3

一个快速的解决方法是在每个节点上使用本地目录（例如Wesley所说的/ tmp），但如果您有容量，则每个节点使用一个MPI任务。

如何在群集上的MPI应用程序中使用scipy.weave.inline？

问题描述

3 个解决方案

解决方案1
1 2013-06-17 20:01:03

解决方案2
1 已采纳 2013-06-18 12:00:20

解决方案3
0 2013-12-03 16:15:07

如何在群集上的MPI应用程序中使用scipy.weave.inline？

问题描述

3 个解决方案

解决方案1 1 2013-06-17 20:01:03

解决方案2 1 已采纳 2013-06-18 12:00:20

解决方案3 0 2013-12-03 16:15:07

解决方案1
1 2013-06-17 20:01:03

解决方案2
1 已采纳 2013-06-18 12:00:20

解决方案3
0 2013-12-03 16:15:07