[英]L2Fwd Application achieved good rate, when using VFs on same port but low rate seen, when sending traffic from VF on 1 port to VF on a different port
Issue Summary:问题摘要:
On a dual port 10Gbps NIC card my dpdk application can successfully sustain ~9Gbps traffic on each port ( I receive traffic on 1 port, process and send via the same port. Similar process on 2nd port using 2nd application).在双端口 10Gbps NIC 卡上,我的 dpdk 应用程序可以成功维持每个端口上约 9Gbps 的流量(我在 1 个端口上接收流量,通过同一个端口处理和发送。使用 2nd 应用程序在 2nd 端口上的类似过程)。
However if my application receives traffic on 1 port and sends it to the 2nd port (internally), and a different application receives traffic on the 2nd port - I can maximum receive only upto 3.4Gbps.但是,如果我的应用程序在 1 个端口上接收流量并将其发送到第二个端口(内部),而另一个应用程序在第二个端口上接收流量 - 我最多只能接收高达 3.4Gbps 的流量。 Beyond this rate, packets are dropped but imissed count in DPDK statistics are not increased.
超出此速率,数据包将被丢弃,但 DPDK 统计信息中的错误计数不会增加。
Issue in Detail:详细问题:
I'm running on a server that has an "X710 for 10GbE SFP+ 1572" ethernet controller with 2 ports/ physical functions.我在具有“X710 for 10GbE SFP+ 1572”以太网 controller 的服务器上运行,该服务器具有 2 个端口/物理功能。 I have created 4 virtual functions on each physical function.
我在每个物理 function 上创建了 4 个虚拟功能。
Physical function:物理 function:
0000:08:00.0 'Ethernet Controller X710 for 10GbE SFP+ 1572' if=ens2f0 drv=i40e unused=vfio-pci *Active*
0000:08:00.1 'Ethernet Controller X710 for 10GbE SFP+ 1572' if=ens2f1 drv=i40e unused=vfio-pci *Active*
Machine specification:机器规格:
CentOS 7.8.2003
Hardware:
Intel(R) Xeon(R) CPU L5520 @ 2.27GHz
L1d cache: 32K, L1i cache: 32K, L2 cache: 256K, L3 cache: 8192K
NIC: X710 for 10GbE SFP+ 1572
RAM: 70Gb
PCI:
Intel Corporation 5520/5500/X58 I/O Hub PCI Express
Capabilities: [90] Express (v2) Root Port (Slot-), MSI 00
LnkSta: Speed 5GT/s, Width x4,
isolcpu: 0,1,2,3,4,5,6,7,8,9,10
NUMA hardware:
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14
node 0 size: 36094 MB
node 1 cpus: 1 3 5 7 9 11 13 15
node 1 size: 36285 MB
Hugepage: size - 2MB and count - 1024
Model-1(Intra VF): Running 2 instances of DPDK l2fwd application namely ApApp1 and ApApp2. Model-1(Intra VF):运行 2 个 DPDK l2fwd 应用程序实例,即 ApApp1 和 ApApp2。 ApApp1is bound with 2 VFs and 2 cores, ApApp2 is bound with 1 VF and 1 core.
ApApp1绑定了2个VF和2个内核,ApApp2绑定了1个VF和1个内核。
Traffic handling: App1 receives external traffic on VF-0 and sends it out via VF-1.流量处理:App1在VF-0接收外部流量,通过VF-1发送出去。 App2 receives external traffic on VF-2 and sends it out via VF-2 itself.
App2 在 VF-2 上接收外部流量并通过 VF-2 自身发送出去。
In this model App1 & App2 together receive 8.8 Gbps and transmit the same without any loss.在这个 model 中,App1 和 App2 一起接收 8.8 Gbps 并无任何损失地传输。
Model-2(inter VF): I have modified the l2fwd application App1 to send the external traffic to App2, App2 receives and sends it back to App1 and App1 sends traffic out to the external destination. Model-2(inter VF):我已经修改了l2fwd应用程序App1以将外部流量发送到App2,App2接收并将其发送回App1,App1将流量发送到外部目的地。
Traffic handling: App1 receives external traffic on VF0 and sends it to App2 via VF1.流量处理:App1在VF0上接收外部流量,通过VF1发给App2。 App2 receives packets on VF2 and sends it out to App1 via VF2 itself.
App2 在 VF2 上接收数据包并通过 VF2 本身将其发送给 App1。 App1 receives packets from App2 on VF1 and sends it out to the external destination via VF0
App1 在 VF1 上接收来自 App2 的数据包,并通过 VF0 将其发送到外部目的地
In this model App1 & App2 together receive only 3.5 Gbps and transmit the same without any loss.在这个 model 中,App1 和 App2 一起仅接收 3.5 Gbps 并且传输相同而没有任何损失。
If I try to increase the traffic rate, not all packets sent by App1 were received by App2 and vice versa.如果我尝试提高流量速率,App1 发送的所有数据包都不会被 App2 接收,反之亦然。 Please note that there were no increase in the imissed count at port level statistics.
请注意,端口级别的统计数据中的错误计数没有增加。 (leading to inference that packets were dropped not because of enough cpu cycles but rather in PCI communication between VFs)
(导致推断数据包被丢弃不是因为足够的 cpu 周期,而是因为 VF 之间的 PCI 通信)
I came across following link https://community.intel.com/t5/Ethernet-Products/SR-IOV-VF-performance-and-default-bandwidth-rate-limiting/mp/277795我遇到了以下链接https://community.intel.com/t5/Ethernet-Products/SR-IOV-VF-performance-and-default-bandwidth-rate-limiting/mp/277795
However for me, in case of intra VF communication there is no issue with the throughput.但是对我来说,在 VF 内部通信的情况下,吞吐量没有问题。 My limited understanding is the communication between two different Physical Functions would happen via PCI express switch
我有限的理解是两个不同物理功能之间的通信将通过 PCI express 开关发生
Is so much deterioration in performance expected(two 10Gbps ports giving throughput of less than 4 Gbps) and hence do I need to change my design?预期性能会出现如此大的恶化(两个 10Gbps 端口的吞吐量低于 4 Gbps),因此我需要更改我的设计吗?
Could it be because of some misconfiguration?可能是因为一些错误配置? Please suggest any pointers to proceed further.
请提出任何指示以进一步进行。
Based on the analysis of the issue, there seem to be a platform configuration related issue which can cause this effects.根据对问题的分析,似乎存在与平台配置相关的问题,可能会导致这种影响。
Problem: (Throughput issue) unable to achieve 20Gbps bi-directional (from simulator ingress and application egress via VF) based on maximum receive only upto 3.4Gbps
.问题:(吞吐量问题)无法实现 20Gbps 双向(从模拟器入口和应用程序出口通过 VF)基于
maximum receive only upto 3.4Gbps
。
[Solution] This is most likely to following reason [解决方案] 这很可能是以下原因
To identify the PCIe lane issue use sudo lscpi -vvvs [PCIe BDF] | gerp Lnk
要识别 PCIe 通道问题,请使用
sudo lscpi -vvvs [PCIe BDF] | gerp Lnk
sudo lscpi -vvvs [PCIe BDF] | gerp Lnk
. sudo lscpi -vvvs [PCIe BDF] | gerp Lnk
Compare for LnkCap
against LnkSta
.比较
LnkCap
和LnkSta
。 If there is a mismatch then its a PCIe lane issue.如果存在不匹配,则为 PCIe 通道问题。
[Edit based on live debug] it has been identified indeed the issue was PCIe link. [基于实时调试编辑] 确实已确定问题是 PCIe 链接。 The current Xeon platform only support PCIe gen-2 4x lanes, while X710-T2 card requires
PCie gen 3, 4x lanes
.目前的至强平台只支持 PCIe gen-2 4x lanes,而 X710-T2 卡需要 PCIe gen-
PCie gen 3, 4x lanes
。
Recommended in upgrading the CPU and mother board with least Broadwell Xeon or better.推荐用于升级至少 Broadwell Xeon 或更好的 CPU 和主板。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.