L2Fwd 应用程序在同一端口上使用 VF 时实现了良好的速率，但在将流量从 1 个端口上的 VF 发送到不同端口上的 VF 时看到低速率

Question

Issue Summary:问题摘要：

On a dual port 10Gbps NIC card my dpdk application can successfully sustain ~9Gbps traffic on each port ( I receive traffic on 1 port, process and send via the same port. Similar process on 2nd port using 2nd application).在双端口 10Gbps NIC 卡上，我的 dpdk 应用程序可以成功维持每个端口上约 9Gbps 的流量（我在 1 个端口上接收流量，通过同一个端口处理和发送。使用 2nd 应用程序在 2nd 端口上的类似过程）。

However if my application receives traffic on 1 port and sends it to the 2nd port (internally), and a different application receives traffic on the 2nd port - I can maximum receive only upto 3.4Gbps.但是，如果我的应用程序在 1 个端口上接收流量并将其发送到第二个端口（内部），而另一个应用程序在第二个端口上接收流量 - 我最多只能接收高达 3.4Gbps 的流量。 Beyond this rate, packets are dropped but imissed count in DPDK statistics are not increased.超出此速率，数据包将被丢弃，但 DPDK 统计信息中的错误计数不会增加。

Issue in Detail:详细问题：

I'm running on a server that has an "X710 for 10GbE SFP+ 1572" ethernet controller with 2 ports/ physical functions.我在具有“X710 for 10GbE SFP+ 1572”以太网 controller 的服务器上运行，该服务器具有 2 个端口/物理功能。 I have created 4 virtual functions on each physical function.我在每个物理 function 上创建了 4 个虚拟功能。

Physical function:物理 function：

0000:08:00.0 'Ethernet Controller X710 for 10GbE SFP+ 1572' if=ens2f0 drv=i40e unused=vfio-pci *Active*
0000:08:00.1 'Ethernet Controller X710 for 10GbE SFP+ 1572' if=ens2f1 drv=i40e unused=vfio-pci *Active*

Machine specification:机器规格：

CentOS 7.8.2003
Hardware:
    Intel(R) Xeon(R) CPU L5520  @ 2.27GHz
    L1d cache: 32K, L1i cache: 32K, L2 cache: 256K, L3 cache: 8192K
    NIC: X710 for 10GbE SFP+ 1572
RAM: 70Gb
PCI:
    Intel Corporation 5520/5500/X58 I/O Hub PCI Express
    Capabilities: [90] Express (v2) Root Port (Slot-), MSI 00
    LnkSta: Speed 5GT/s, Width x4,
isolcpu: 0,1,2,3,4,5,6,7,8,9,10
NUMA hardware:
    available: 2 nodes (0-1)
    node 0 cpus: 0 2 4 6 8 10 12 14
    node 0 size: 36094 MB
    node 1 cpus: 1 3 5 7 9 11 13 15
    node 1 size: 36285 MB
Hugepage: size - 2MB and count - 1024

Model-1(Intra VF): Running 2 instances of DPDK l2fwd application namely ApApp1 and ApApp2. Model-1（Intra VF）：运行 2 个 DPDK l2fwd 应用程序实例，即 ApApp1 和 ApApp2。 ApApp1is bound with 2 VFs and 2 cores, ApApp2 is bound with 1 VF and 1 core. ApApp1绑定了2个VF和2个内核，ApApp2绑定了1个VF和1个内核。

模型一图

Traffic handling: App1 receives external traffic on VF-0 and sends it out via VF-1.流量处理：App1在VF-0接收外部流量，通过VF-1发送出去。 App2 receives external traffic on VF-2 and sends it out via VF-2 itself. App2 在 VF-2 上接收外部流量并通过 VF-2 自身发送出去。

In this model App1 & App2 together receive 8.8 Gbps and transmit the same without any loss.在这个 model 中，App1 和 App2 一起接收 8.8 Gbps 并无任何损失地传输。

Model-2(inter VF): I have modified the l2fwd application App1 to send the external traffic to App2, App2 receives and sends it back to App1 and App1 sends traffic out to the external destination. Model-2（inter VF）：我已经修改了l2fwd应用程序App1以将外部流量发送到App2，App2接收并将其发送回App1，App1将流量发送到外部目的地。

Model-2 diagram模型 2 图

Traffic handling: App1 receives external traffic on VF0 and sends it to App2 via VF1.流量处理：App1在VF0上接收外部流量，通过VF1发给App2。 App2 receives packets on VF2 and sends it out to App1 via VF2 itself. App2 在 VF2 上接收数据包并通过 VF2 本身将其发送给 App1。 App1 receives packets from App2 on VF1 and sends it out to the external destination via VF0 App1 在 VF1 上接收来自 App2 的数据包，并通过 VF0 将其发送到外部目的地

In this model App1 & App2 together receive only 3.5 Gbps and transmit the same without any loss.在这个 model 中，App1 和 App2 一起仅接收 3.5 Gbps 并且传输相同而没有任何损失。

If I try to increase the traffic rate, not all packets sent by App1 were received by App2 and vice versa.如果我尝试提高流量速率，App1 发送的所有数据包都不会被 App2 接收，反之亦然。 Please note that there were no increase in the imissed count at port level statistics.请注意，端口级别的统计数据中的错误计数没有增加。 (leading to inference that packets were dropped not because of enough cpu cycles but rather in PCI communication between VFs) （导致推断数据包被丢弃不是因为足够的 cpu 周期，而是因为 VF 之间的 PCI 通信）

I came across following link https://community.intel.com/t5/Ethernet-Products/SR-IOV-VF-performance-and-default-bandwidth-rate-limiting/mp/277795我遇到了以下链接https://community.intel.com/t5/Ethernet-Products/SR-IOV-VF-performance-and-default-bandwidth-rate-limiting/mp/277795

However for me, in case of intra VF communication there is no issue with the throughput.但是对我来说，在 VF 内部通信的情况下，吞吐量没有问题。 My limited understanding is the communication between two different Physical Functions would happen via PCI express switch我有限的理解是两个不同物理功能之间的通信将通过 PCI express 开关发生

Is so much deterioration in performance expected(two 10Gbps ports giving throughput of less than 4 Gbps) and hence do I need to change my design?预期性能会出现如此大的恶化（两个 10Gbps 端口的吞吐量低于 4 Gbps），因此我需要更改我的设计吗？

Could it be because of some misconfiguration?可能是因为一些错误配置？ Please suggest any pointers to proceed further.请提出任何指示以进一步进行。

Answer 1

Based on the analysis of the issue, there seem to be a platform configuration related issue which can cause this effects.根据对问题的分析，似乎存在与平台配置相关的问题，可能会导致这种影响。

Problem: (Throughput issue) unable to achieve 20Gbps bi-directional (from simulator ingress and application egress via VF) based on maximum receive only upto 3.4Gbps .问题：（吞吐量问题）无法实现 20Gbps 双向（从模拟器入口和应用程序出口通过 VF）基于maximum receive only upto 3.4Gbps 。

[Solution] This is most likely to following reason [解决方案] 这很可能是以下原因

Interconnect cables like fiber, copper, DAC might be faulty.光纤、铜线、DAC 等互连电缆可能有故障。 very unlikely for both ports.两个端口都不太可能。
Both ports might be negotiating to half duplex.两个端口可能正在协商为半双工。 Not likely because default settings for DPDK force full duplex.不太可能，因为 DPDK 的默认设置强制全双工。
Platform or motherboard not setting the right PCIe gen or allocating sufficient lanes.平台或主板未设置正确的 PCIe 代或分配足够的通道。 Most likely最有可能的

To identify the PCIe lane issue use sudo lscpi -vvvs [PCIe BDF] | gerp Lnk要识别 PCIe 通道问题，请使用sudo lscpi -vvvs [PCIe BDF] | gerp Lnk sudo lscpi -vvvs [PCIe BDF] | gerp Lnk . sudo lscpi -vvvs [PCIe BDF] | gerp Lnk Compare for LnkCap against LnkSta .比较LnkCap和LnkSta 。 If there is a mismatch then its a PCIe lane issue.如果存在不匹配，则为 PCIe 通道问题。

[Edit based on live debug] it has been identified indeed the issue was PCIe link. [基于实时调试编辑] 确实已确定问题是 PCIe 链接。 The current Xeon platform only support PCIe gen-2 4x lanes, while X710-T2 card requires PCie gen 3, 4x lanes .目前的至强平台只支持 PCIe gen-2 4x lanes，而 X710-T2 卡需要 PCIe gen- PCie gen 3, 4x lanes 。

Recommended in upgrading the CPU and mother board with least Broadwell Xeon or better.推荐用于升级至少 Broadwell Xeon 或更好的 CPU 和主板。

L2Fwd 应用程序在同一端口上使用 VF 时实现了良好的速率，但在将流量从 1 个端口上的 VF 发送到不同端口上的 VF 时看到低速率

问题描述

1 个解决方案

解决方案1
0 2022-09-16 17:04:48

L2Fwd 应用程序在同一端口上使用 VF 时实现了良好的速率，但在将流量从 1 个端口上的 VF 发送到不同端口上的 VF 时看到低速率

问题描述

1 个解决方案

解决方案1 0 2022-09-16 17:04:48

解决方案1
0 2022-09-16 17:04:48