简体   繁体   English

如何获取任务A的节点的IP /主机名并通过SLURM传递给任务B?

[英]How to get IP/hostname of task A's node and pass to task B via SLURM?

I have a (home-grown) cluster network benchmark that I'm trying to run using the SLURM scheduler. 我有一个尝试使用SLURM调度程序运行的(本地)群集网络基准测试。 The benchmark uses a standard client/server architecture that requires a server IP address (or hostname) argument to the client executable on launch. 该基准测试使用标准的客户端/服务器体系结构,该体系结构需要在启动时为客户端可执行文件提供服务器IP地址(或主机名)参数。

Normally I would write a server script that would grep the address of the primary nic and drop the information on a shared filesystem, but AFAIK that's not going to work on a cluster node. 通常,我会编写一个服务器脚本,该脚本会grep主网卡的地址,并将信息放置在共享文件系统上,但是AFAIK无法在集群节点上运行。 I also understand that there is a SLURM_JOB_NODELIST env variable that allows a script to see a list of all the nodes in the cluster from my sbatch script, but I don't see how that's useful in this case. 我也知道有一个SLURM_JOB_NODELIST env变量,该变量允许脚本从我的sbatch脚本中查看群集中所有节点的列表,但是在这种情况下,我看不到它的用处。

How do I determine which node the scheduler has selected to run the benchmark server and pass that information to the client task before/as it is launched? 我如何确定调度程序选择了哪个节点来运行基准服务器并将该信息传递给客户端任务,然后再将其启动?

I can't believe I didn't think of this before asking. 我不敢相信我在问之前没有想到这一点。 This is easier than it sounds, and SLURM_JOB_NODELIST is the key. 这比听起来容易,并且SLURM_JOB_NODELIST 关键。 You can pass that variable from the sbatch script to a second shell script that tests $(hostname) and launches the appropriate executable in such a way that both hostnames are known, like so: 您可以将该变量从sbatch脚本传递到另一个测试$(hostname) shell脚本,并以已知两个主机名的方式启动相应的可执行文件,如下所示:

name=$(echo $1 | cut -d '-' -f1 -)
node1=$(echo $1 | cut -d '-' -f2 - | tr -d '[')
node2=$(echo $1 | cut -d '-' -f3 - | tr -d ']')
if [ "$(hostname)" == "$name-$node1" ]; then
    server.exe
else
    client.exe $name-$node1
fi

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM