简体   繁体   English

SLURM 作业因 sbatch 而失败,但因 srun 而成功

[英]SLURM job failing with sbatch, successful with srun

A researcher is submitting a job to our cluster that is failing when run with sbatch, but succeeding when run with srun.一位研究人员正在向我们的集群提交一个作业,该作业在使用 sbatch 运行时失败,但在使用 srun 运行时成功。 Any ideas on why this could be?关于为什么会这样的任何想法? I've included the error messages and the slurm script below:我在下面包含了错误消息和 slurm 脚本:

Error message:错误信息:


Unable to init server: Could not connect: Connection refused

(canavier_model_changes_no_plots.py:1589287): Gdk-CRITICAL **: 22:46:57.434: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed

can't open DISPLAY

My first thought based on that error was that it is something with the code that slurm is running rather than with the slurm functions itself, but not sure why srun would work if that is the case?基于该错误,我的第一个想法是它与 slurm 运行的代码有关,而不是与 slurm 函数本身有关,但不确定如果是这种情况,为什么 srun 会起作用?

Here is the slurm script:这是 slurm 脚本:


#SBATCH --job-name=networkmodel

#SBATCH --nodes=1

#SBATCH --cpus-per-task=10

#SBATCH --mem-per-cpu=4G

#SBATCH --time=00-00:05:00

python3 canavier_model_changes_no_plots.py

She thought it might have something to do with matplotlob scripts in her code, but it still failed when those were removed.她认为这可能与她代码中的 matplotlob 脚本有关,但在删除这些脚本后仍然失败。 Again, the code runs with srun, and fails with sbatch.同样,代码以 srun 运行,并以 sbatch 失败。

The error message is indicative that the job is trying to run an X11 application that attempts to create a GUI window.该错误消息表明作业正在尝试运行试图创建 GUI 窗口的 X11 应用程序。 Matplotlib might very well be the cause indeed. Matplotlib 很可能确实是原因。 The script should make sure to only create files and not try anything related to GUI windows.该脚本应确保只创建文件而不尝试与 GUI 窗口相关的任何操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM