Python使用Slurm进行多处理

Question

I want to run a simple task using multiprocessing (I think this one is the same as using parfor in matlab correct?) 我想使用多处理运行一个简单的任务（我认为这与在matlab中使用parfor正确吗？）

For example: 例如：

from multiprocessing import Pool
def func_sq(i):
    fig=plt.plot(x[i,:])     #x is a ready-to-use large ndarray, just want
    fig.save(....)           #to plot each column on a separate figure

pool = Pool()
pool.map(func_sq,[1,2,3,4,5,6,7,8])

But I am very confused of how to use slurm to submit my job. 但是我对如何使用口吃来提交工作感到非常困惑。 I have been searching for answers but could not find a good one. 我一直在寻找答案，但找不到一个好的答案。 Currently, while not using multiprocessing, I am using slurm job sumit file like this:(named test1.sh) 当前，虽然不使用多重处理，但我正在使用这样的Slurm Job Suit文件：（名为test1.sh）

#!/bin/bash

#SBATCH -N 1
#SBATCH -n 1
#SBATCH -p batch
#SBATCH --exclusive

module load anaconda3
source activate py36
srun python test1.py

Then, I type in sbatch test1.sh in my prompt window. 然后，在提示窗口中键入sbatch test1.sh。

So if I would like to use the multiprocessing, how should I modify my sh file? 因此，如果我想使用多重处理，应该如何修改sh文件？ I have tried by myself but it seems just changing my -n to 16 and Pool(16) makes my job repeat 16 times. 我自己尝试过，但似乎只是将-n更改为16，而Pool（16）使我的工作重复16次。

Or is there a way to maximize my performance if multiprocessing is not suitable (I have heard about multithreating but don't know how it exactly works) 或者，如果不适合使用多重处理，有没有一种方法可以最大化我的性能（我听说过多重处理，但是不知道它到底如何工作）

And how do I effectively use my memory so that it won't crush? 以及我如何有效地使用我的记忆，以免记忆力下降？ (My x matrix is very large) （我的x矩阵很大）

Thanks! 谢谢！

For the GPU, is that possible to do the same thing? 对于GPU，有可能做同样的事情吗？ My current submission script without multiprocessing is: 我当前没有多处理程序的提交脚本是：

#!/bin/bash

#SBATCH -n 1
#SBATCH -p gpu
#SBATCH --gres=gpu:1

Thanks a lot! 非常感谢！

Answer 1

The "-n" flag is setting the number of tasks your sbatch submission is going to execute, which is why your script is running multiple times. “ -n”标志设置您的批处理提交将要执行的任务数，这就是脚本多次运行的原因。 What you want to change is the "-c" argument which is how many CPUs each task is assigned. 您要更改的是“ -c”参数，该参数是每个任务分配了多少个CPU。 Your script spawns additional processes but they will be children of the parent process and share the CPUs assigned to it. 您的脚本会产生其他进程，但它们将成为父进程的子进程，并共享分配给它的CPU。 Just add "#SBATCH -c 16" to your script. 只需在脚本中添加“ #SBATCH -c 16”即可。 As for memory, there is a default amount of memory your job will be given per CPU, so increasing the number of CPUs will also increase the amount of memory available. 至于内存，每个CPU将为您的作业分配默认的内存量，因此增加CPU数量也会增加可用内存量。 If you're not getting enough, add something like "#SBATCH --mem=20000M" to request a specific amount. 如果还不够，请添加“ #SBATCH --mem = 20000M”之类的内容以请求特定金额。

Answer 2

I don't mean to be unwelcoming here, but this question seems to indicate that you don't actually understand the tools you're using here. 我并不是要在这里不受欢迎，但是这个问题似乎表明您实际上并不了解在此使用的工具。 Python Multiprocessing allows a single Python program to launch child processes to help it perform work in parallel. Python多重处理允许单个Python程序启动子进程，以帮助其并行执行工作。 This is particularly helpful because multithreading (which is commonly how you'd accomplish this in other programming languages) doesn't gain you parallel code execution in Python, due to Python's Global Interpreter Lock . 这是特别有用的，因为由于Python的Global Interpreter Lock ，多线程（通常是您在其他编程语言中完成此任务的方式）无法在Python中获得并行代码执行。

Slurm (which I don't use, but from some quick research) seems to be a fairly high-level utility that allows individuals to submit work to some sort of cluster of computers (or a supercomputer... usually similar concepts). Slurm（我不使用，但经过一些快速研究）似乎是一个相当高级的实用程序，它允许个人将工作提交到某种类型的计算机集群（或超级计算机，通常是类似的概念）。 It has no visibility, per se, into how the program it launches runs; 它本身对启动程序的运行方式没有可见性。 that is, it has no relationship to the fact that your Python program proceeds to launch 16 (or however many) helper processes. 也就是说，它与Python程序继续启动16个（或许多）辅助进程的事实无关。 Its job is to schedule your Python program to run as a black box, then sit back and make sure it finishes successfully. 它的工作是安排您的Python程序作为黑盒运行，然后坐下来并确保它成功完成。

You seem to have some vague data processing problem. 您似乎有一些模糊的数据处理问题。 You describe it as a large matrix, but you don't give nearly enough information for me to actually understand what you're trying to accomplish. 您将其描述为一个大型矩阵，但是您却没有给我足够的信息来真正理解您要完成的任务。 Regardless, if you don't actually understand what you're doing and how the tools you're using work, you're just flailing until you maybe eventually get lucky enough for this to work. 无论如何，如果您实际上不了解自己在做什么以及所使用的工具是如何工作的，那么您只会之以鼻，直到最终可能幸运地使它能够工作。 Stop guessing, figure out what these tools do, look around and read documentation, then figure out what you're trying to accomplish and how you could go about splitting up the work in a reasonable fashion. 停止猜测，弄清楚这些工具的作用，浏览并阅读文档，然后弄清楚您要完成的工作以及如何以合理的方式拆分工作。

Here's my best guess, but I really have very little information to work from so it may not be helpful at all: 这是我的最佳猜测，但实际上我掌握的信息很少，因此可能根本没有帮助：

Your Python script has no concept that it's being run multiple times by Slurm (the -n 16 you refer to, I guess). 您的Python脚本不知道它会被Slurm多次运行（我想您所指的-n 16）。 It makes sense, then, that the job gets repeated 16 times, because Slurm runs the entire script 16 times, and each time your Python script does the entire task from start to finish. 然后，有意义的是，该作业被重复了16次，因为Slurm会运行整个脚本16次，并且每次Python脚本从头到尾都执行整个任务。 If you want Slurm and your Python program to interact, so that the Python program expects to get run multiple times in parallel, I have no idea how to help you there, you'll just need to read more into Slurm. 如果您希望Slurm与您的Python程序进行交互，以使Python程序期望并行运行多次，那么我不知道该如何帮助您，您只需要阅读更多有关Slurm的内容。
Your data must be able to be read incrementally, or partially, if you have any hope of breaking this job into pieces. 如果您希望将这项工作分解成几部分，则必须能够增量或部分读取数据。 That is, if you can only read the entire matrix all at once, or not at all, you're stuck with solutions that begin by reading the entire matrix into memory, which you indicate is not really an option. 就是说，如果您一次只能读取整个矩阵，或者根本无法读取整个矩阵，那么您将不得不采用将整个矩阵读取到内存中的解决方案，这并不是您真正的选择。 Assuming you can, and that you want to perform some work on each row independently, then you're fortunate enough for your task to be what's officially known as "embarrassingly parallel". 假设您可以并且想要独立地在每一行上执行一些工作，那么您很幸运地将您的任务称为正式的“令人尴尬的并行”。 This is a very good thing, if true. 如果为真，这是一件非常好的事情。
Assuming your problem is embarrassingly parallel (since it looks like you're just trying to load each row of your data matrix, plot it somehow, then save off that plot as an image to disk), and you can load your data incrementally, then continue reading up on Python's multiprocessing module, and Pool().map is probably the right direction to be headed in. Create some Python generator that produces rows of your data matrix, then pass that generator and func_sq to pool.map , and sit back and wait for the job to finish. 假设您的问题令人尴尬地是并行的（因为看起来您只是尝试加载数据矩阵的每一行，以某种方式对其进行绘制，然后将该图形另存为映像保存到磁盘），则可以逐步加载数据，然后继续阅读Python的多处理模块， Pool().map可能是正确的方向。创建一些Python生成器以生成数据矩阵的行，然后将该生成器和func_sq给pool.map ，然后坐下来然后等待工作完成。
If you really need to do this work across multiple machines, rather than hacking your own Slurm + Multiprocessing stack, I'd suggest you start using actual data processing tools, such as PySpark. 如果您真的需要跨多台计算机执行此工作，而不是破解自己的Slurm + Multiprocessing堆栈，建议您开始使用实际的数据处理工具，例如PySpark。

This doesn't sound like a trivial problem, and even if it were, you don't give sufficient details for me to provide a robust answer. 听起来这不是一个小问题，即使有，您也没有提供足够详细的信息来提供可靠的答案。 There's no "just fix this one line" answer to what you've asked, but I hope this helps give you an idea of what your tools are doing and how to proceed from here. 您所提出的问题没有“仅解决这一问题”的答案，但是我希望这可以帮助您了解您的工具在做什么以及如何从这里开始。

Python使用Slurm进行多处理

问题描述

Thanks! 谢谢！

Thanks a lot! 非常感谢！

2 个解决方案

解决方案1
2 已采纳 2018-06-21 21:22:39

解决方案2
1 2018-06-21 19:54:16

Python使用Slurm进行多处理

问题描述

Thanks! 谢谢！

Thanks a lot! 非常感谢！

2 个解决方案

解决方案1 2 已采纳 2018-06-21 21:22:39

解决方案2 1 2018-06-21 19:54:16

解决方案1
2 已采纳 2018-06-21 21:22:39

解决方案2
1 2018-06-21 19:54:16