SLURM 上的多处理不使用多个 CPU

Question

I am on a SLURM cluster and want to run the following multiprocess.我在 SLURM 集群上，想要运行以下多进程。 The tasks are totally parallelizable but it seems they're still occurring serially.这些任务是完全可并行的，但似乎它们仍在连续发生。

Code is:代码是：

#load data (this is a df of files that need to be processed)
left = loadData()

processes = []

#split the list of files in 22 groups based on column chrom
for i in range(1,23):
            left_chrom = left[left['chrom'] == i]
            #Pass each DF of files to multiprocessing (note this function calls a subprocess to process the file)
            p_ins = multiprocessing.Process(target=ViewVCFConvert, args = (left_chrom,))
            processes.append(p_ins)
            p_ins.start()
            for process in processes:
                process.join()

My slurm settings are:我的 slurm 设置是：

#!/bin/bash
#SBATCH --job-name=VCF
#SBATCH --partition=abc
#SBATCH --nodes=1
#SBATCH --cpus-per-task=22
#SBATCH --mem=1G
#SBATCH --time=10:00:00

However when I run this, the files are processed serially.但是，当我运行它时，文件是按顺序处理的。 I have checked this by adding a print function to show when a file is processed.我已经通过添加打印功能来显示文件何时被处理来检查这一点。 I would expect the output of those print statements to be like:我希望这些打印语句的输出如下：

file1, chrom=2
file4, chrom=5
file3, chrom=8

Instead the output I get is:相反，我得到的输出是：

file1, chrom=4
file2, chrom=4
file3, chrom=4

This implies the files are being processed in order (although multiprocessing is doing something as it does not always start with chrom=1 as in a normal for loop).这意味着文件正在按顺序处理（尽管多处理正在做一些事情，因为它并不总是像在正常的 for 循环中那样以 chrom=1 开头）。

Answer 1

So the solution came from another answer ( Python multiprocessing pool inside a loop ).所以解决方案来自另一个答案（循环内的 Python 多处理池）。 Code is below.代码如下。 Basically I needed to use Pool and not Process as I wanted to run ViewVCFConvert in parallel for all list chrom.基本上我需要使用 Pool 而不是 Process，因为我想为所有列表 chrom 并行运行 ViewVCFConvert。 If I had several functions and I wanted to run them all in parallel one chrom at a time then I would use Processing.如果我有几个函数并且我想一次并行运行一个 chrom，那么我会使用 Processing。 This is why it was still running serially, it was doing ViewVCFConvert once at a time.这就是为什么它仍然连续运行的原因，它一次只执行一次 ViewVCFConvert。

from multiprocessing import Pool
    def main():
        chrom = [i for i in range(1,23)]
        pool = Pool(22)
        pool.map(ViewVCFConvert, chrom)
        pool.close()
        pool.join()

view the documentation on the site ( https://docs.python.org/3/library/multiprocessing.html ) to see the difference between Pool and Process.查看站点上的文档 ( https://docs.python.org/3/library/multiprocessing.html ) 以了解 Pool 和 Process 之间的区别。

SLURM 上的多处理不使用多个 CPU

问题描述

1 个解决方案

解决方案1
0 2022-05-28 14:06:45

SLURM 上的多处理不使用多个 CPU

问题描述

1 个解决方案

解决方案1 0 2022-05-28 14:06:45

解决方案1
0 2022-05-28 14:06:45