不合理的 netperf 基准测试结果

Question

I used netperf benchmark with the next commands:我在下面的命令中使用了 netperf 基准测试：

server side: netserver -4 -v -d -N -p服务器端：netserver -4 -v -d -N -p

client side: netperf -H -p -l 60 -T 1,1 -t TCP_RR客户端：netperf -H -p -l 60 -T 1,1 -t TCP_RR

And I received the results:我收到了结果：

MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.28 () port 0 AF_INET : demo : first burst 0 : cpu bind迁移的 TCP 请求/响应测试从 0.0.0.0 (0.0.0.0) 端口 0 AF_INET 到 10.0.0.28 () 端口 0 AF_INET：演示：第一次突发 0：cpu 绑定

Local /Remote本地/远程

Socket Size Request Resp.套接字大小请求响应。 Elapsed Trans.已过的Trans。 Send Recv Size Size Time Rate发送接收大小大小时间速率
bytes Bytes bytes bytes secs.字节字节字节秒。 per sec每秒

16384 131072 1 1 60.00 9147.83 16384 131072 1 1 60.00 9147.83
16384 131072 16384 131072

But when I changed the client to single CPU (same machine) by adding "maxcpus=1 nr_cpus=1" to kernel command line.但是当我通过在内核命令行中添加“maxcpus=1 nr_cpus=1”将客户端更改为单 CPU（同一台机器）时。 And I ran the next command:我运行了下一个命令：

netperf -H -p -l 60 -t TCP_RR netperf -H -p -l 60 -t TCP_RR

I received the next results:我收到了下一个结果：

MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.28 () port 0 AF_INET : demo : first burst 0 : cpu bind迁移的 TCP 请求/响应测试从 0.0.0.0 (0.0.0.0) 端口 0 AF_INET 到 10.0.0.28 () 端口 0 AF_INET：演示：第一次突发 0：cpu 绑定

Local /Remote本地/远程

Socket Size Request Resp.套接字大小请求响应。 Elapsed Trans.已过的Trans。 Send Recv Size Size Time Rate发送接收大小大小时间速率
bytes Bytes bytes bytes secs.字节字节字节秒。 per sec每秒

16384 131072 1 1 60.00 10183.33 16384 131072 1 1 60.00 10183.33
16384 131072 16384 131072

Q: I don't understand how the performance has been improved when I decreased the CPUs number from 64 to 1 CPU?问：我不明白当我将 CPU 数量从 64 个减少到 1 个 CPU 时性能如何提高？

Some technique information: I used Standard_L64s_v3 instance type of Azure;一些技术信息：我使用了 Azure 的 Standard_L64s_v3 实例类型； OS: sles:15:sp2操作系统：sles:15:sp2

Answer 1

• The 'netperf' utility command executed by you on the client side is as follows and is the same after changing the number of CPUs on the client side but you can see an improvement in performance after decreasing the number of vCPUs on the client VM: - • 您在客户端执行的“netperf”实用程序命令如下，在客户端更改 CPU 数量后相同，但您可以看到减少客户端 VM 上的 vCPU 数量后性能有所提高： -

netperf -H -p -l 60 -I 1,1 -t TCP_RR

The above command implies that you want to test the network connectivity performance between the host 'Server' and 'Client' for TCP Request/Response and get the results in a default directory path where pipes will be created for a period of 60 seconds .上面的命令意味着您要测试主机“服务器”和“客户端”之间的 TCP 请求/响应的网络连接性能，并在默认目录路径中获取结果，在该路径中将创建 60 秒的管道。

• The CPU utilization measurement mechanism uses 'proc/stat' on Linux OS to record the time spent for such command executions . • CPU 利用率测量机制在 Linux 操作系统上使用“proc/stat”来记录此类命令执行所花费的时间。 The code for this mechanism can be found in 'src/netcpu_procstat.c' .这种机制的代码可以在'src/netcpu_procstat.c'中找到。 Thus, you can check the configuration file accordingly.因此，您可以相应地检查配置文件。

Also, the CPU utilization mechanism in a virtual guest environment, ie, a virtual machine may not reflect the actual utilization as in a bare metal environment because much of the networking processing happens outside the context of the virtual machine .此外，虚拟客户环境（即虚拟机）中的 CPU 利用率机制可能无法像在裸机环境中那样反映实际利用率，因为大部分网络处理发生在虚拟机环境之外。 Thus, as per the below documentation link by Hewlett-Packard: -因此，根据 Hewlett-Packard 的以下文档链接：-

https://hewlettpackard.github.io/netperf/doc/netperf.html https://hewlettpackard.github.io/netperf/doc/netperf.html

If one is looking to measure the added overhead of a virtualization mechanism, rather than rely on CPU utilization, one can rely instead on netperf _RR tests - path-lengths and overheads can be a significant fraction of the latency, so increases in overhead should appear as decreases in transaction rate.如果要测量虚拟化机制的额外开销，而不是依赖 CPU 利用率，则可以依赖 netperf _RR 测试 - 路径长度和开销可能是延迟的很大一部分，因此应该会出现开销增加随着交易率的下降。 Whatever you do, DO NOT rely on the throughput of a _STREAM test.无论您做什么，都不要依赖 _STREAM 测试的吞吐量。 Achieving link-rate can be done via a multitude of options that mask overhead rather than eliminate it.可以通过多种选项来实现链路速率，这些选项掩盖而不是消除开销。

As a result, I would suggest you rely on other monitoring tools available in Azure, ie, Azure Monitor, Application insights, etc.因此，我建议您依赖 Azure 中可用的其他监控工具，即 Azure Monitor、ApplicationInsights 等。

Answer 2

Looking more closely at your netperf command line:更仔细地查看您的 netperf 命令行：

netperf -H -p -l 60 -T 1,1 -t TCP_RR netperf -H -p -l 60 -T 1,1 -t TCP_RR

The -H option expects to take a hostname as an argument. -H 选项期望将主机名作为参数。 And the -p option expects to take a port number as an argument.并且 -p 选项期望将端口号作为参数。 As written the "-p" will be interpreted as a hostname.如所写，“-p”将被解释为主机名。 And when I tried it at least will fail.当我尝试它时，它至少会失败。 I assume you've omitted some of the command line?我假设您省略了一些命令行？

The -T option will bind where netperf and netserver will run (in this case on vCPU 1 on the netperf side and vCPU 1 on the netserver side) but it will not necessarily control where at least some of the network stack processing will take place. -T 选项将绑定 netperf 和 netserver 将运行的位置（在这种情况下，在 netperf 端的 vCPU 1 和 netserver 端的 vCPU 1 上），但它不一定控制至少一些网络堆栈处理将发生的位置。 So, in your 64-vCPU setup, the interrupts for the networking traffic and perhaps the stack will run on a different vCPU.因此，在您的 64-vCPU 设置中，网络流量和堆栈的中断可能会在不同的 vCPU 上运行。 In your 1-vCPU setup, everything will be on the one vCPU.在您的 1-vCPU 设置中，一切都将在一个 vCPU 上。 It is quite conceivable you are seeing the effects of cache-to-cache transfers in the 64-vCPU case leading to lower transaction/s rates.可以想象，在 64 个 vCPU 的情况下，您会看到缓存到缓存传输的影响，从而导致较低的事务/秒速率。

Going to multi-processor will increase aggregate performance, but it will not necessarily increase single thread/stream performance.使用多处理器会提高总体性能，但不一定会提高单线程/流性能。 And single thread/stream performance can indeed degrade.单线程/流性能确实会降低。

不合理的 netperf 基准测试结果

问题描述

2 个解决方案

解决方案1
0 2022-07-21 12:20:37

解决方案2
0 2022-07-22 15:20:44

不合理的 netperf 基准测试结果

问题描述

2 个解决方案

解决方案1 0 2022-07-21 12:20:37

解决方案2 0 2022-07-22 15:20:44

解决方案1
0 2022-07-21 12:20:37

解决方案2
0 2022-07-22 15:20:44