[英]Run perf with an MPI application
perf
is a performance analysis tool which can report hardware and software events. perf
是一种性能分析工具,可以报告硬件和软件事件。 I am trying to run it with an MPI application in order to learn how much time the application spends within each core on data transfers and compute operations. 我正在尝试使用MPI应用程序运行它,以了解该应用程序在每个内核中花费多少时间进行数据传输和计算操作。
Normally, I would run my application with 通常,我将使用
mpirun -np $NUMBER_OF_CORES app_name
And it would spawn on several cores or possibly several nodes. 它会在多个核心或可能的多个节点上生成。 Is it possible to add perf
on top? 是否有可能增加perf
在上面? I've tried 我试过了
perf stat mpirun -np $NUMBER_OF_CORES app_name
But the output for this looks like some sort of aggregate of mpirun. 但是,此输出看起来像是mpirun的某种汇总。 Is there a way to collect perf type data from each core? 有没有一种方法可以从每个内核收集性能类型数据?
Something like: 就像是:
mpirun -np $NUMBER_OF_CORES ./myscript.sh
might work with myscript.sh containing: 可能与myscript.sh一起使用,其中包含:
#! /bin/bash
perf stat app_name %*
You should add some parameter to the perf call to produce differently named result files. 您应该在perf调用中添加一些参数,以产生名称不同的结果文件。
perf
can follow spawned child processes. perf
可以遵循产生的子进程。 To profile the MPI processes located on the same node, you can simply do 要分析位于同一节点上的MPI进程,您只需
perf stat mpiexec -n 2 ./my-mpi-app
You can use perf record
as well. 您也可以使用性能perf record
。 It will create a single perf.data
file containing the profiling information for all the local MPI processes. 它将创建一个单一的perf.data
文件,其中包含所有本地MPI进程的分析信息。 However, this won't allow you to profile individual MPI ranks. 但是,这不允许您分析单个MPI等级。
To find out information about individual mpi ranks, you need to run 要查找有关各个MPI等级的信息,您需要运行
mpiexec -n 2 perf stat ./my-mpi-app
This will profile the individual ranks and will also work across multiple nodes. 这将描述各个等级,并且还将跨多个节点工作。 However, this does not work with some perf
commands such as perf record
. 但是,这不适用于某些perf
命令,例如perf record
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.