如何解释numademo输出

Question

numademo utility (is part of numactl package) is shipped with many popular linux distributions (RHEL, SLES, ...). numademo实用程序（是numactl软件包的一部分）随附于许多流行的Linux发行版（RHEL，SLES等）。 I tried to find out any documentation related to this tool but I was not able to find any useful information. 我试图找到与此工具有关的任何文档，但找不到任何有用的信息。 Either no one is using it or everyone using it know all about it. 要么没人在使用它，要么每个使用它的人都知道它。

Here's a sample output 这是示例输出

2 nodes available 2个节点可用

memory with no policy memcpy              Avg 10415.77 MB/s Max 10427.37 MB/s Min 10377.83 MB/s
local memory memcpy                       Avg 9499.52 MB/s Max 10423.22 MB/s Min 7239.55 MB/s
memory interleaved on all nodes memcpy    Avg 7355.64 MB/s Max 7657.19 MB/s Min 6284.92 MB/s
memory on node 0 memcpy                   Avg 5837.94 MB/s Max 6073.07 MB/s Min 5067.05 MB/s
memory on node 1 memcpy                   Avg 10285.20 MB/s Max 10425.29 MB/s Min 9206.11 MB/s
memory interleaved on 0 1 memcpy          Avg 7513.01 MB/s Max 7658.31 MB/s Min 6440.88 MB/s

setting preferred node to 0
memory without policy memcpy              Avg 6071.17 MB/s Max 6073.07 MB/s Min 6069.55 MB/s

setting preferred node to 1
memory without policy memcpy              Avg 9126.62 MB/s Max 10427.37 MB/s Min 7236.55 MB/s
manual interleaving to all nodes memcpy   Avg 7357.19 MB/s Max 7656.07 MB/s Min 6439.30 MB/s
manual interleaving on node 0/1 memcpy    Avg 7512.90 MB/s Max 7658.31 MB/s Min 6439.30 MB/s

current interleave node 1
running on node 0, preferred node 0
local memory memcpy                       Avg 10086.53 MB/s Max 10423.22 MB/s Min 8943.84 MB/s
memory interleaved on all nodes memcpy    Avg 6451.66 MB/s Max 6454.36 MB/s Min 6448.01 MB/s
memory interleaved on node 0/1 memcpy     Avg 5199.00 MB/s Max 5200.24 MB/s Min 5196.63 MB/s
alloc on node 1 memcpy                    Avg 5068.47 MB/s Max 5069.99 MB/s Min 5067.05 MB/s
local allocation memcpy                   Avg 10248.81 MB/s Max 10421.15 MB/s Min 8933.17 MB/s

setting wrong preferred node memcpy       Avg 6070.75 MB/s Max 6072.37 MB/s Min 6067.45 MB/s
setting correct preferred node memcpy     Avg 10418.04 MB/s Max 10423.22 MB/s Min 10408.74 MB/s

running on node 1, preferred node 0
local memory memcpy                       Avg 10417.63 MB/s Max 10423.22 MB/s Min 10400.48 MB/s
memory interleaved on all nodes memcpy    Avg 7653.39 MB/s Max 7660.55 MB/s Min 7641.57 MB/s
memory interleaved on node 0/1 memcpy     Avg 6949.18 MB/s Max 7658.31 MB/s Min 5201.27 MB/s
alloc on node 0 memcpy                    Avg 5952.14 MB/s Max 6073.77 MB/s Min 5065.10 MB/s
local allocation memcpy                   Avg 10419.28 MB/s Max 10425.29 MB/s Min 10402.54 MB/s

setting wrong preferred node memcpy       Avg 6069.06 MB/s Max 6073.07 MB/s Min 6059.03 MB/s
setting correct preferred node memcpy     Avg 10248.81 MB/s Max 10423.22 MB/s Min 8946.89 MB/s

I need to know how these tests are carried out ? 我需要知道这些测试是如何进行的吗？

how to interpret these results ? 如何解释这些结果？

eg: what can cause following numbers to differ drastically. 例如：是什么原因可能导致以下数字大大不同。

memory on node 0 memcpy                   Avg 5837.94 MB/s
memory on node 1 memcpy                   Avg 10285.20 MB/s

Thanks, Harshana 谢谢，哈夏娜

Answer 1

The test is pretty self-explanatory. 该测试是不言自明的。 It uses the functions in libnuma to allocate memory on different NUMA nodes and measure the time it takes to do operation on it. 它使用libnuma的函数在不同的NUMA节点上分配内存，并测量对其进行操作所需的时间。 For example, it appears that your program is running initially on a CPU core from the second NUMA domain, therefore accessing memory on node 0 is almost twice as slower. 例如，您的程序似乎最初在第二个NUMA域中的CPU内核上运行，因此访问节点0上的内存的速度几乎快了一倍。 The access speed to interleaved memory is usually the average of the access speeds of both domains since pages are distributed in a round-robin fashion. 由于页面是以循环方式分配的，因此对交错存储器的访问速度通常是两个域的访问速度的平均值。

setting preferred node to 0 means that the program is telling the OS to give preference on allocating memory on node 0. The following test confirms that this policy is working as the speed is still slow (as the program still runs on node 1). setting preferred node to 0表示程序正在告诉OS优先在节点0上分配内存。以下测试确认了该策略在速度仍然很慢的情况下仍在工作（因为程序仍在节点1上运行）。

setting preferred node to 1 tells the OS to allocate memory preferably on node 1. The speeds are thus higher as this one is local to the executing program. setting preferred node to 1告诉OS最好在节点1上分配内存。因此，速度较高，因为这是执行程序的本地程序。

running on node 0, preferred node 0 - the program moves itself to node 0 (libnuma also supports CPU binding as well as memory binding) and sets the memory allocator to also prefer node 0. Therefore the preferred memory location is now local to the executing program and therefore the speeds are high. running on node 0, preferred node 0 -程序将自身移至节点0（libnuma还支持CPU绑定以及内存绑定），并将内存分配器设置为也首选节点0。因此，首选内存位置现在对于执行内存为本地程序，因此速度很高。

And so on. 等等。 Just take a look at the source code of the utility. 只需看一下该实用程序的源代码即可。

The results are not very symmetric and the reasons for that are quite complex. 结果不是很对称，原因很复杂。 Mind that NUMA is badly implemented on Linux, at least in the 2.6.x kernels (things might have improved in 3.x). 请注意，至少在2.6.x内核中，NUMA在Linux上的实现很差（在3.x中可能有所改善）。 For example, the memory allocator tends to coalesce consecutive virtual allocations and then the memory binding policy is no longer honoured, eg a region of VM bound to node 0 is sometimes mapped onto pages in node 1. Also, if memory gets swapped out to the disk, whenever it is being brought back, the NUMA policy is completely ignored, eg memory that was bound to NUMA node 0 might end up on node 1. 例如，内存分配器倾向于合并连续的虚拟分配，然后不再遵守内存绑定策略，例如，有时将绑定到节点0的VM区域映射到节点1的页面上。磁盘，无论何时将其取回，NUMA策略都会被完全忽略，例如，绑定到NUMA节点0的内存可能会在节点1上结束。

Answer 2

numademo is a binary provided by numactl package. numademo是numactl软件包提供的二进制文件。 It provides a quick overview of NUMA performance of the system. 它提供了系统NUMA性能的快速概述。

numademo command show the effect of different memory allocation policies on the system. numademo命令显示不同内存分配策略对系统的影响。

$ numademo --help
usage: numademo [-S] [-f] [-c] [-e] [-t] msize[kmg] {tests}
No tests means run all.
-c output CSV data. -f run even without NUMA API. -S run stupid tests. -e exit on error
-t regression test; do not run all node combinations
valid tests: memset memcpy forward backward stream random2 ptrchase

Detail of valid test can be check at: https://github.com/jmesmon/numactl/blob/master/numademo.c 可以在以下位置查看有效测试的详细信息： https : //github.com/jmesmon/numactl/blob/master/numademo.c

如何解释numademo输出

问题描述

2 个解决方案

解决方案1
1 2014-01-25 00:46:06

解决方案2
0 2014-01-25 01:08:12

如何解释numademo输出

问题描述

2 个解决方案

解决方案1 1 2014-01-25 00:46:06

解决方案2 0 2014-01-25 01:08:12

解决方案1
1 2014-01-25 00:46:06

解决方案2
0 2014-01-25 01:08:12