[英]pxssh does not work between compute nodes in a slurm cluster
I'm using the following script for connecting two compute nodes in a slurm cluster. 我正在使用以下脚本来连接Slurm群集中的两个计算节点。
from getpass import getuser
from socket import gethostname
from pexpect import pxssh
import sys
python = sys.executable
worker_command = "%s -m worker" % python + " %i " + server_socket
pid = 0
children = []
for node, ntasks in node_list.items():
if node == gethostname():
continue
if node != gethostname():
pid_range = range(pid, pid + ntasks)
pid += ntasks
ssh = pxssh.pxssh()
ssh.login(node, getuser())
for worker in pid_range:
ssh.sendline(worker_command % worker + '&')
children.append(ssh)
node_list
is a dictionary {'cn000': 28, 'cn001': 28}
. node_list
是字典{'cn000': 28, 'cn001': 28}
。 worker
is a python file placed in the working dictionary. worker
是放置在工作词典中的python文件。
I expect ssh.sendline
to be the same as pexpect.spawn
. 我希望
ssh.sendline
与pexpect.spawn
相同。 However, nothing happened after I ran the script. 但是,运行脚本后没有任何反应。
Although an ssh session was built by ssh.login(node, getuser())
, it seems the line ssh.sendline(worker_command % worker)
has no effect, because the script to be run by worker_command
is not run. 尽管ssh会话是由
ssh.login(node, getuser())
,但是ssh.sendline(worker_command % worker)
行似乎无效,因为由worker_command
运行的脚本未运行。
How can I fix this? 我怎样才能解决这个问题? Or should I try something else?
还是我应该尝试其他东西?
How can I create one socket on one compute node and connect it with a socket on another compute node? 如何在一个计算节点上创建一个套接字,并将其与另一计算节点上的套接字连接?
There is missing a '%s' from the content of worker_command. 在worker_command的内容中缺少'%s'。 It contains something like this: "/usr/bin/python3 -m worker" -> worker_command%worker should result in error.
它包含如下内容:“ / usr / bin / python3 -m worker”-> worker_command%worker应该导致错误。
If not (it is possible, because this source looks like a short part of the original program), then add ">>workerprocess.log 2>&1" string before '&', then try to run your program and take a look at workerprocess.log on the server! 如果不是(可能,因为此源看起来像原始程序的一小部分),则在“&”之前添加“ >> workerprocess.log 2>&1”字符串,然后尝试运行程序并查看服务器上的workerprocess.log! If your $HOME is writable on the server, you should find the error message(s) in it.
如果$ HOME在服务器上可写,则应在其中找到错误消息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.