简体   繁体   English

如何使用 Slurm C API 获取 memory 使用信息?

[英]How to get memory usage information using Slurm C API?

I am looking for the way to get per job memory usage information from Slurm using C API, namely memory used and memory reserved . I am looking for the way to get per job memory usage information from Slurm using C API, namely memory used and memory reserved . I thought I could get such stats by calling slurm_load_jobs(…) , but looking at job_step_info_t type definition I could not see any relevant fields.我以为我可以通过调用slurm_load_jobs(…)来获得这样的统计信息,但是查看job_step_info_t类型定义我看不到任何相关字段。 Perhaps there could be something in job_resrcs , but it is an opaque data type and I have no idea how to use it.也许job_resrcs中可能有一些东西,但它是一种不透明的数据类型,我不知道如何使用它。 Or is there another API call that would give me detailed memory usage info?或者是否有另一个 API 电话会给我详细的 memory 使用信息? Please advise.请指教。

This question was partially answered in this SO thread where the focus was only on the compiler errors.这个问题在这个 SO 线程中得到了部分回答,其中只关注编译器错误。 The missing portion of code was the loop through memory_allocated and memory_used arrays sized according to the number of hosts the job was dispatched to:代码的缺失部分是通过memory_allocatedmemory_used arrays 的循环,其大小根据作业被分派到的主机数量而定:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include "slurm/slurm.h"
#include "slurm/slurm_errno.h"


struct job_resources {
        bitstr_t *core_bitmap;
        bitstr_t *core_bitmap_used;
        uint32_t  cpu_array_cnt;
        uint16_t *cpu_array_value;
        uint32_t *cpu_array_reps;
        uint16_t *cpus;
        uint16_t *cpus_used;
        uint16_t *cores_per_socket;
        uint64_t *memory_allocated;
        uint64_t *memory_used;
        uint32_t  nhosts;
        bitstr_t *node_bitmap;
        uint32_t  node_req;
        char     *nodes;
        uint32_t  ncpus;
        uint32_t *sock_core_rep_count;
        uint16_t *sockets_per_node;
        uint16_t *tasks_per_node;
        uint8_t   whole_node;

};

int main(int argc, char** argv)
{
        int i, j, slurm_err;
        uint64_t mem_alloc, mem_used;
        job_info_msg_t *jobs;

        /* Load job info from Slurm */
        slurm_err = slurm_load_jobs((time_t) NULL, &jobs, SHOW_DETAIL);
        printf("job_id,cluster,partition,user_id,name,job_state,mem_allocated,mem_used\n");
        /* Print jobs info to the file in CSV format */
        for (i = 0; i < jobs->record_count; i++)
        {
                mem_alloc = 0;
                mem_used = 0;
                for (j = 0; j < jobs->job_array[i].job_resrcs->nhosts; j++)
                {
                        mem_alloc += jobs->job_array[i].job_resrcs->memory_allocated[j];
                        mem_used  += jobs->job_array[i].job_resrcs->memory_used[0];
                }
                printf("%d,%s,%s,%d,%s,%d,%d,%d\n",
                        jobs->job_array[i].job_id,
                        jobs->job_array[i].cluster,
                        jobs->job_array[i].partition,
                        jobs->job_array[i].user_id,
                        jobs->job_array[i].name,
                        jobs->job_array[i].job_state,
                        mem_alloc,
                        mem_used
                );
        }
        slurm_free_job_info_msg(jobs);
        return 0;
}

This program compiles and runs without errors.该程序编译并运行没有错误。 One thing I noticed though is that mem_used is either 0 or equal to mem_alloc which sometimes differs from what I get from the sstat command.我注意到的一件事是mem_used是 0 或等于mem_alloc ,这有时与我从sstat命令得到的不同。 I will have to investigate this further...我将不得不进一步调查...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM