SLURM 批处理脚本不执行 Python 脚本，不返回错误消息并且不停止运行

Question

In a SLURM cluster I am submitting a shell script that calls a python script (both scripts can be found below. When the shell script executes it get until where the python script is called but then nothing happens: there is no output, no error message and the SLURM job keeps running. In a SLURM cluster I am submitting a shell script that calls a python script (both scripts can be found below. When the shell script executes it get until where the python script is called but then nothing happens: there is no output, no error message并且 SLURM 作业继续运行。

I assume the entire contents of the python script are not relevant (but I included it anyway for completion).我假设 python 脚本的全部内容不相关（但我还是将其包括在内以完成）。 For debugging purposes I inserted the print("script started") line at the very beginning to see if it gets run but it doesn't.出于调试目的，我在一开始就插入了print("script started")行，以查看它是否可以运行，但没有。 The last thing I see in the output is moved to directory .我在 output 中看到的最后一件事被moved to directory 。

I tried calling a test.py script containing print("test") right before this and it gets executed normally.我尝试在此之前调用包含print("test")的test.py脚本，它可以正常执行。

What could be the reason the python script doesn't start and how can I fix it? python 脚本无法启动的原因可能是什么，我该如何解决？

Edit: As user jakub recommended changing print("script started") to print("script started", flush=True) successfully gets printed.编辑：正如用户 jakub 建议将print("script started")更改为print("script started", flush=True)成功打印。 Including several more of these statements revealed that the script was actually running perfectly fine, it just didn't output anything.包括更多这些语句表明该脚本实际上运行得很好，它只是没有 output 任何东西。 Including the same statement within the for loop that gets constantly executed also makes all print() statements previously missing get printed.在不断执行的 for 循环中包含相同的语句也会使之前丢失的所有print()语句都被打印出来。

The question then turns into: why do the print() statements here need to have flush=True in this script but not in other scripts?那么问题就变成了：为什么这里的print()语句需要在这个脚本中使用flush=True而在其他脚本中不需要？

Shell script: Shell 脚本：

#!/bin/bash
#SBATCH --mail-user=lukas.baehler@pathology.unibe.ch
#SBATCH --mail-type=end,fail
#SBATCH --output=out-ncl
#SBATCH --error=err-ncl
#SBATCH --job-name="Mask RCNN nucleus training and detection"

#SBATCH --time=24:00:00
#SBATCH --partition=gpu
#SBATCH --mem-per-cpu=64G

#SBATCH --gres=gpu:gtx1080ti:1
#SBATCH --constraint=rtx2080


conda init bash
source ~/.bashrc
conda activate nucl

cd MRCNN/samples/nucleus
echo "moved to directory"

python nucleus-pipeline2.py splitTMA
echo "Split TMAs"

Python script: Python 脚本：


print("script started")

if __name__ == '__main__':
    import argparse
    import os

    # Copied from later in script because the argparse part was moved up and is
    # needed as default in --logs.
    ROOT_DIR = os.path.abspath("../../")
    DEFAULT_LOGS_DIR = os.path.join(ROOT_DIR, "logs")

    # Parse command line arguments
    parser = argparse.ArgumentParser(
        description='Mask R-CNN for nuclei counting and segmentation')
    parser.add_argument("command",
                        metavar="<command>",
                        help="'splitTMA', 'splitSpot', 'structure', 'train' or 'detect'")
    parser.add_argument('--dataset', required=False,
                        metavar="/path/to/dataset/",
                        help='Root directory of the dataset')
    parser.add_argument('--weights', required=False,
                        metavar="/path/to/weights.h5",
                        help="Path to weights .h5 file or 'coco'")
    parser.add_argument('--logs', required=False,
                        default=DEFAULT_LOGS_DIR,
                        metavar="/path/to/logs/",
                        help='Logs and checkpoints directory (default=logs/)')
    parser.add_argument('--subset', required=False,
                        metavar="Dataset sub-directory",
                        help="Subset of dataset to run prediction on")

    # Own arguments
    parser.add_argument("--input", required=False,
                        metavar="path/to/input/folder",
                        help="Optionally specify the input directory. Should only be used with splitTMA, splitSpot and structure.")
    parser.add_argument("--output", required=False,
                        metavar="path/to/output/folder",
                        help="Optionally specify the output directory. Should only be used with splitTMA, splitSpot and structure.")

    args = parser.parse_args()
    assert args.command in ["train", "detect", "splitTMA", "splitSpot", "structure"], "Must set command."




    ################################################################################
    # splitTMA
    ################################################################################


    # The original script for this is tma-spot.py
    # Splits a TMA into images of its spots.
    if args.command == "splitTMA":


        import os
        import cv2
        import numpy as np
        from openslide import open_slide
        from matplotlib import pyplot as plt


        ###################
        # CONFIGURATION

        # Defines the level of resolution for spot recognition
        level = 7 # Default 7

        # Defines the level of resolution to use for the new images
        newLevel = 0 # Default 0 (higest resolution)

        # Defines the spot size in pixels (has to be changed if newLevel is changed)
        SpotSize = 3072 # Default 3500

        # # Shift values are for alignment of the two slides.
        # shiftX = 445 - 10
        # shiftY = -64 + 10

        print("Using the following parameters:\nlevel = {}\nnewLevel = {}\nSpotSize = {}".format(level, newLevel, SpotSize))
        ###################


        # NUCLEUS_DIR = "MRCNN/samples/nucleus"
        NUCLEUS_DIR = os.path.abspath("")
        os.chdir(NUCLEUS_DIR)

        if args.input:
            INPUT_DIR = args.input
        else:
            INPUT_DIR = "slides"
        print("Using '{}' as input folder.".format(INPUT_DIR))

        if args.output:
            OUTPUT_DIR = args.output
        else:
            OUTPUT_DIR = "spots"
        print("Using '{}' as output folder.".format(OUTPUT_DIR))

        # mrxs_filenames = [filename for filename in os.listdir("slides") if filename[-5:] == ".mrxs"]
        mrxs_filenames = [filename for filename in os.listdir(INPUT_DIR) if filename[-5:] == ".mrxs"]
        print("\nFound {} MIRAX files.".format(len(mrxs_filenames)))

        # Loop through all .mrxs files.
        for filename in mrxs_filenames:

            print("\nReading {}\n".format(filename))

            # filename = mrxs_filenames[0]
            img = open_slide("{}/{}".format(INPUT_DIR, filename))

            # # Use if you want to to see the resolution of all the levels.
            # for i in range(img.level_count):
            #     print("Level", i, "dimension", img.level_dimensions[i],"down factor",img.level_downsamples[i])

            # Use the level set previously and read the slide as an RGB image.
            x_img = img.read_region((0,0), level, img.level_dimensions[level])
            x_img = np.array(x_img)
            rgb = np.zeros_like(x_img)
            rgb[x_img==0] = 255
            rgba_im = cv2.add(rgb,x_img)
            imgLevel = cv2.cvtColor(rgba_im,cv2.COLOR_RGBA2RGB)
            # plt.imsave("./Output/level" + str(level) + ".png", imgLevel) # <---------- USE FOR DEBUGGING

            # Converts the image to gray levels and applies a gussian blur.
            gray = cv2.cvtColor(imgLevel, cv2.COLOR_BGR2GRAY)
            gray_blur = cv2.GaussianBlur(gray, (3, 3), 0)
            # cv2.imwrite( "./Output/gray.png", gray_blur) # <-------------------------- USE FOR DEBUGGING

            # Use an Otsu binarization to generate a mask for where tissue is.
            ret3, thresh = cv2.threshold(gray_blur, 8, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
            thresh = ~thresh
            cont_img = thresh.copy()
            # cv2.imwrite( "spots/cd3/contour.png", cont_img) # <------------------------ USE FOR DEBUGGING

            # Finds the contours of the mask generated.
            contours, hierarchy = cv2.findContours(cont_img, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

            # Loop through all contours
            spot_nr = 0
            for cnt in contours:

                # Decide based on the area of the contour if it is a spot
                area = cv2.contourArea(cnt)
                spotInfo = []
                x, y, w, h = cv2.boundingRect(cnt)
                if area < 100 or area > 2000:
                    spotInfo.append([-1, x, y, w, h])
                    continue
                if len(cnt) < 5:
                    spotInfo.append([-1, x, y, w, h])
                    continue

                # Calculate the center of the spot
                centerX = x + int(w/2)
                centerY = y + int(h/2)

                # Calculate how much it needs to be scaled
                factorOld = img.level_downsamples[level]
                factorNew = img.level_downsamples[newLevel]

                # Read the spot region
                spot = img.read_region((int(centerX * factorOld)-int(SpotSize/2),
                                        int(centerY * factorOld)-int(SpotSize/2)),
                                        newLevel, (SpotSize, SpotSize))
                spot = cv2.cvtColor(np.array(spot), cv2.COLOR_RGBA2RGB)

                # Create directory and save the image
                if not os.path.isdir("{}/{}".format(OUTPUT_DIR, filename[:-5])):
                    os.makedirs("{}/{}".format(OUTPUT_DIR, filename[:-5]))
                spot_name =  "{0}/{1}/{1}-{2}.png".format(OUTPUT_DIR, filename[:-5],str(spot_nr).zfill(3))
                plt.imsave(spot_name, spot)
                spot_nr += 1

                print("Spot {} saved - Center X and Y: {}, {}".format(spot_nr, centerX, centerY))
        exit()



    ################################################################################
    # splitSpot
    ################################################################################


    # This is copied from spot-annotation.py
    # Splits spots into tiles
    if args.command == "splitSpot":

        import os
        import sys
        import argparse

        import re
        import numpy as np
        import cv2
        from matplotlib import pyplot as plt


        # VARIABLES

        # Change the resolution of the tiles here. Note the image resolution
        # must be an integer multiple of the tile resolutions (both dimensions).
        tile_resolution = [768, 768]

        # NUCLEUS_DIR = "MRCNN/samples/nucleus"
        NUCLEUS_DIR = os.path.abspath("")
        os.chdir(NUCLEUS_DIR)

        if args.input:
            INPUT_DIR = args.input
        else:
            INPUT_DIR = "spots"
        print("\nUsing '{}' as input folder.".format(INPUT_DIR))

        if args.output:
            OUTPUT_DIR = args.output
        else:
            OUTPUT_DIR = "tiles"
        print("Using '{}' as output folder.".format(OUTPUT_DIR))


        # EXECUTION

        TMA_folders = os.listdir(INPUT_DIR)
        spot_names = []
        spot_count = 0
        for name in TMA_folders:
            spot_names.append(os.listdir("{}/{}".format(INPUT_DIR, name)))
            spot_count += len(spot_names[-1])
        print("\nFound {} TMA folders with a total of {} spot images.".format(len(TMA_folders), spot_count))


        for a, TMA in enumerate(TMA_folders):
            for b, spot in enumerate(spot_names[a]):

                print("TMA: {}/{} - Spot: {}/{}".format(a+1, len(TMA_folders), b+1, len(spot_names[a])), end="\r")


                # Read the image
                img = cv2.imread("{}/{}/{}".format(INPUT_DIR,TMA, spot))

                # Calculate how many tiles will be produced
                tilesX = img.shape[0]/tile_resolution[0]
                tilesY = img.shape[1]/tile_resolution[1]
                assert (tilesX == int(tilesX) and tilesY == int(tilesY)), "Image resolution is not an integer multiple of the tile resolution."
                tilesX, tilesY = int(tilesX), int(tilesY)

                # Create the np array that will hold the tiles
                tiles = np.zeros([tilesY,tilesX,tile_resolution[0],tile_resolution[1],3])

                # Loop through all tiles and store them in tiles
                for i in range(tilesX):
                    for j in range(tilesY):
                        tiles[j,i] = img[i*tile_resolution[0]:(i+1)*tile_resolution[0],
                                        j*tile_resolution[1]:(j+1)*tile_resolution[1]]

                tiles = tiles.astype("uint8")

                # print("\nImage was split into {} tiles.".format(tiles.shape[0]*tiles.shape[1]))

                # Save all the tiles
                for x in range(tiles.shape[0]):
                    for y in range(tiles.shape[1]):
                        # Displays progression
                        # print("Saving {}/{} images...".format(str(x*tiles.shape[0]+y+1),tiles.shape[0]*tiles.shape[1]), end="\r")

                        # Using the plt.imsave() gives alterations in color which is
                        # presumably bad. Using cv2.imwrite() is also ca. 10 times faster.
                        imdir = "{}/{}/{}".format(OUTPUT_DIR, TMA, spot[:-4])
                        imname = "{}-{}-{}.png".format(spot[:-4], str(x).zfill(2), str(y).zfill(2))
                        if not os.path.isdir(imdir):
                            os.makedirs(imdir)


                        cv2.imwrite("{}/{}".format(imdir, imname), tiles[x,y])

        print("\nSaved images in {} as [spot_name]-x-y.png.".format(OUTPUT_DIR))
        exit()



    ################################################################################
    # Prepare Data Structure
    ################################################################################


    # Adapted from prepare-data-structure.py
    # Creates the data structure required for the network
    if args.command == "structure":

        import os
        from shutil import copyfile

        NUCLEUS_DIR = os.path.abspath("")
        os.chdir(NUCLEUS_DIR)

        # Setting input and output directories
        if args.input:
            INPUT_DIR = args.input
        else:
            INPUT_DIR = "tiles"
        print("\nUsing '{}' as input folder.".format(INPUT_DIR))

        if args.output:
            OUTPUT_DIR = args.output
        else:
            OUTPUT_DIR = "data"
        print("Using '{}' as output folder.".format(OUTPUT_DIR))

        # Creates a list with the paths of all tiles. Also stores just the
        # filename itself with and without file extension
        file_names = []
        for path,_,files in os.walk(INPUT_DIR):
            for f in files:
                file_names.append(["{}/{}".format(path, f), f, f[:-4]])
        print("\nFound {} images.".format(len(file_names)))

        assert file_names != [], "No images found in input folder."

        # The dataset needs to be stored inside another folder (default "own_data")
        subset = "own_data"

        # For each file creates the appropriate sub-folders and copies the file.
        skip_count = 0
        for i,info in enumerate(file_names):
            print("Saving {}/{} images.".format(i+1, len(file_names)), end="\r")
            dirname = "{}/{}/{}/images".format(OUTPUT_DIR, subset, info[2])
            try:
                os.makedirs(dirname)
            except:
                skip_count += 1
                continue
            copyfile(info[0], "{}/{}".format(dirname, info[1]))


        print("\n\nSaved dataset in {}/{}".format(OUTPUT_DIR, subset))
        if skip_count > 0:
            print("Skipped {} files because they already existed.".format(skip_count))
        print("")
        exit()

Answer 1

Python buffers stdin, stdout, and stderr by default. Python 默认缓冲标准输入、标准输出和标准错误。 print() writes to stdout by default, so you will see this buffered behavior. print()默认写入stdout ，因此您将看到这种缓冲行为。

From https://stackoverflow.com/a/14258511/5666087 :从https://stackoverflow.com/a/14258511/5666087 ：

Python opens the stdin, -out and -error streams in a buffered mode; Python以缓冲模式打开stdin、-out和-error流； it'll read or write in larger chunks, keeping data in memory until a threshold is reached.它将读取或写入更大的块，将数据保存在 memory 中，直到达到阈值。

You can forcibly flush this buffer by passing flush=True to print .您可以通过将flush=True传递给print来强制刷新此缓冲区。 See the documentation for more information.有关更多信息，请参阅文档。 If you have multiple print statements in a row, you need only use flush=True in the last one.如果一行中有多个print语句，则只需在最后一个中使用flush=True 。

SLURM 批处理脚本不执行 Python 脚本，不返回错误消息并且不停止运行

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-08-15 12:34:55

SLURM 批处理脚本不执行 Python 脚本，不返回错误消息并且不停止运行

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-08-15 12:34:55

解决方案1
0 已采纳 2020-08-15 12:34:55