简体   繁体   English

如何在群集上的MPI应用程序中使用scipy.weave.inline?

[英]How can scipy.weave.inline be used in a MPI-enabled application on a cluster?

If scipy.weave.inline is called inside a massive parallel MPI-enabled application that is run on a cluster with a home-directory that is common to all nodes, every instance accesses the same catalog for compiled code: $HOME/.pythonxx_compiled. 如果在一个大型并行MPI启用的应用程序中调用scipy.weave.inline,该应用程序在具有所有节点通用的主目录的集群上运行,则每个实例都会访问编译代码的相同目录:$ HOME / .pythonxx_compiled。 This is bad for obvious reasons and leads to many error messages. 由于明显的原因这很糟糕,并导致许多错误消息。 How can this problem be circumvented? 如何规避这个问题呢?

As per the scipy docs , you could store your compiled data in a directory that isn't on the NFS share (such as /tmp or /scratch or whatever is available for your system). 根据scipy文档 ,您可以将编译的数据存储在不在NFS共享上的目录中(例如/ tmp或/ scratch或您的系统可用的任何内容)。 Then you wouldn't have to worry about your conflicts. 那你就不用担心你的冲突了。 You just need to set the PYTHONCOMPILED environment variable to something else. 您只需要将PYTHONCOMPILED环境变量设置为其他内容。

My previous thoughts about this problem: 我之前对此问题的看法:

Either scipy.weave.catalog has to be enhanced with a proper locking mechanism in order to serialize access to the catalog, or every instance has to use its own catalog. scipy.weave.catalog必须使用适当的锁定机制进行增强,以便序列化对目录的访问,或者每个实例都必须使用自己的目录。

I chose the latter. 我选择了后者。 The scipy.weave.inline function uses a catalog which is bound to the module-level name function_catalog of the scipy.weave.inline module. scipy.weave.inline函数使用一个目录,该目录绑定到scipy.weave.inline模块的模块级名称function_catalog This can be discovered by looking into the code of this module ( https://github.com/scipy/scipy/tree/v0.12.0/scipy/weave ). 这可以通过查看该模块的代码( https://github.com/scipy/scipy/tree/v0.12.0/scipy/weave )来发现。

The simples solution is now to monkeypatch this name to something else at the beginning of the program: 简单的解决方案现在是在程序开始时将此名称单个化为其他名称:

from mpi4py import MPI

import numpy as np

import scipy.weave.inline_tools
import scipy.weave.catalog

import os
import os.path

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

catalog_dir = os.path.join(some_path,  'rank'+str(rank))
try:
    os.makedirs(catalog_dir)
except OSError:
    pass

#monkeypatching the catalog
scipy.weave.inline_tools.function_catalog = scipy.weave.catalog.catalog(catalog_dir)

Now inline works smoothly: Each instance has its own catalog inside the common NFS directory. 现在inline工作顺利:每个实例在公共NFS目录中都有自己的目录。 Of course this naming scheme breaks if two distinct parallel tasks ran at the same time, but this would also be the case if the catalog was in /tmp. 当然,如果两个不同的并行任务同时运行,则此命名方案会中断,但如果目录位于/ tmp中,也会出现这种情况。

Edit : As mentioned in a comment above this procedure still bears problems if multiple indepedent jobs are run in parallel. 编辑 :如上面的评论中所述,如果多个独立作业并行运行,此过程仍然存在问题。 This can be remedied by adding a random uuid to the pathname: 这可以通过在路径名中添加随机uuid来解决:

import uuid

u = None
if rank == 0:
    u = str(uuid.uuid4())

u = comm.scatter([u]*size, root=0)

catalog_dir = os.path.join('/tmp/<username>/pythoncompiled',  u+'-'+str(rank))
os.makedirs(catalog_dir)

#monkeypatching the catalog
scipy.weave.inline_tools.function_catalog = scipy.weave.catalog.catalog(catalog_dir)

Of course it would be nice to delete those files after the computation: 当然,在计算之后删除这些文件会很好:

shutil.rmtree(catalog_dir)

Edit : There were some additional problems. 编辑 :还有一些其他问题。 The intermediate directory where cpp and o files are stored also hat some trouble due to simultaneous access from different instances, so the above method has to be extended to this directory: 由于来自不同实例的同时访问,存储cpp和o文件的中间目录也存在一些问题,因此上述方法必须扩展到此目录:

basetmp = some_path
catalog_dir = os.path.join(basetmp, 'pythoncompiled',  u+'-'+str(rank))
intermediate_dir = os.path.join(basetmp, 'pythonintermediate',  u+'-'+str(rank))

os.makedirs(catalog_dir, mode=0o700)
os.makedirs(intermediate_dir, mode=0o700)

#monkeypatching the catalog and intermediate_dir
scipy.weave.inline_tools.function_catalog = scipy.weave.catalog.catalog(catalog_dir)
scipy.weave.catalog.intermediate_dir = lambda: intermediate_dir

#... calculations here ...

shutil.rmtree(catalog_dir)
shutil.rmtree(intermediate_dir)

一个快速的解决方法是在每个节点上使用本地目录(例如Wesley所说的/ tmp),但如果您有容量,则每个节点使用一个MPI任务。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM