[英]Python use slurm for multiprocessing
I want to run a simple task using multiprocessing (I think this one is the same as using parfor in matlab correct?) 我想使用多处理运行一个简单的任务(我认为这与在matlab中使用parfor正确吗?)
For example: 例如:
from multiprocessing import Pool
def func_sq(i):
fig=plt.plot(x[i,:]) #x is a ready-to-use large ndarray, just want
fig.save(....) #to plot each column on a separate figure
pool = Pool()
pool.map(func_sq,[1,2,3,4,5,6,7,8])
But I am very confused of how to use slurm to submit my job. 但是我对如何使用口吃来提交工作感到非常困惑。 I have been searching for answers but could not find a good one.
我一直在寻找答案,但找不到一个好的答案。 Currently, while not using multiprocessing, I am using slurm job sumit file like this:(named test1.sh)
当前,虽然不使用多重处理,但我正在使用这样的Slurm Job Suit文件:(名为test1.sh)
#!/bin/bash
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -p batch
#SBATCH --exclusive
module load anaconda3
source activate py36
srun python test1.py
Then, I type in sbatch test1.sh in my prompt window. 然后,在提示窗口中键入sbatch test1.sh。
So if I would like to use the multiprocessing, how should I modify my sh file? 因此,如果我想使用多重处理,应该如何修改sh文件? I have tried by myself but it seems just changing my -n to 16 and Pool(16) makes my job repeat 16 times.
我自己尝试过,但似乎只是将-n更改为16,而Pool(16)使我的工作重复16次。
Or is there a way to maximize my performance if multiprocessing is not suitable (I have heard about multithreating but don't know how it exactly works) 或者,如果不适合使用多重处理,有没有一种方法可以最大化我的性能(我听说过多重处理,但是不知道它到底如何工作)
And how do I effectively use my memory so that it won't crush? 以及我如何有效地使用我的记忆,以免记忆力下降? (My x matrix is very large)
(我的x矩阵很大)
For the GPU, is that possible to do the same thing? 对于GPU,有可能做同样的事情吗? My current submission script without multiprocessing is:
我当前没有多处理程序的提交脚本是:
#!/bin/bash
#SBATCH -n 1
#SBATCH -p gpu
#SBATCH --gres=gpu:1
The "-n" flag is setting the number of tasks your sbatch submission is going to execute, which is why your script is running multiple times. “ -n”标志设置您的批处理提交将要执行的任务数,这就是脚本多次运行的原因。 What you want to change is the "-c" argument which is how many CPUs each task is assigned.
您要更改的是“ -c”参数,该参数是每个任务分配了多少个CPU。 Your script spawns additional processes but they will be children of the parent process and share the CPUs assigned to it.
您的脚本会产生其他进程,但它们将成为父进程的子进程,并共享分配给它的CPU。 Just add "#SBATCH -c 16" to your script.
只需在脚本中添加“ #SBATCH -c 16”即可。 As for memory, there is a default amount of memory your job will be given per CPU, so increasing the number of CPUs will also increase the amount of memory available.
至于内存,每个CPU将为您的作业分配默认的内存量,因此增加CPU数量也会增加可用内存量。 If you're not getting enough, add something like "#SBATCH --mem=20000M" to request a specific amount.
如果还不够,请添加“ #SBATCH --mem = 20000M”之类的内容以请求特定金额。
I don't mean to be unwelcoming here, but this question seems to indicate that you don't actually understand the tools you're using here. 我并不是要在这里不受欢迎,但是这个问题似乎表明您实际上并不了解在此使用的工具。 Python Multiprocessing allows a single Python program to launch child processes to help it perform work in parallel.
Python多重处理允许单个Python程序启动子进程,以帮助其并行执行工作。 This is particularly helpful because multithreading (which is commonly how you'd accomplish this in other programming languages) doesn't gain you parallel code execution in Python, due to Python's Global Interpreter Lock .
这是特别有用的,因为由于Python的Global Interpreter Lock ,多线程(通常是您在其他编程语言中完成此任务的方式)无法在Python中获得并行代码执行。
Slurm (which I don't use, but from some quick research) seems to be a fairly high-level utility that allows individuals to submit work to some sort of cluster of computers (or a supercomputer... usually similar concepts). Slurm(我不使用,但经过一些快速研究)似乎是一个相当高级的实用程序,它允许个人将工作提交到某种类型的计算机集群(或超级计算机,通常是类似的概念)。 It has no visibility, per se, into how the program it launches runs;
它本身对启动程序的运行方式没有可见性。 that is, it has no relationship to the fact that your Python program proceeds to launch 16 (or however many) helper processes.
也就是说,它与Python程序继续启动16个(或许多)辅助进程的事实无关。 Its job is to schedule your Python program to run as a black box, then sit back and make sure it finishes successfully.
它的工作是安排您的Python程序作为黑盒运行,然后坐下来并确保它成功完成。
You seem to have some vague data processing problem. 您似乎有一些模糊的数据处理问题。 You describe it as a large matrix, but you don't give nearly enough information for me to actually understand what you're trying to accomplish.
您将其描述为一个大型矩阵,但是您却没有给我足够的信息来真正理解您要完成的任务。 Regardless, if you don't actually understand what you're doing and how the tools you're using work, you're just flailing until you maybe eventually get lucky enough for this to work.
无论如何,如果您实际上不了解自己在做什么以及所使用的工具是如何工作的,那么您只会之以鼻,直到最终可能幸运地使它能够工作。 Stop guessing, figure out what these tools do, look around and read documentation, then figure out what you're trying to accomplish and how you could go about splitting up the work in a reasonable fashion.
停止猜测,弄清楚这些工具的作用,浏览并阅读文档,然后弄清楚您要完成的工作以及如何以合理的方式拆分工作。
Here's my best guess, but I really have very little information to work from so it may not be helpful at all: 这是我的最佳猜测,但实际上我掌握的信息很少,因此可能根本没有帮助:
Pool().map
is probably the right direction to be headed in. Create some Python generator that produces rows of your data matrix, then pass that generator and func_sq
to pool.map
, and sit back and wait for the job to finish. Pool().map
可能是正确的方向。创建一些Python生成器以生成数据矩阵的行,然后将该生成器和func_sq
给pool.map
,然后坐下来然后等待工作完成。 This doesn't sound like a trivial problem, and even if it were, you don't give sufficient details for me to provide a robust answer. 听起来这不是一个小问题,即使有,您也没有提供足够详细的信息来提供可靠的答案。 There's no "just fix this one line" answer to what you've asked, but I hope this helps give you an idea of what your tools are doing and how to proceed from here.
您所提出的问题没有“仅解决这一问题”的答案,但是我希望这可以帮助您了解您的工具在做什么以及如何从这里开始。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.