Issue Summary:
On a dual port 10Gbps NIC card my dpdk application can successfully sustain ~9Gbps traffic on each port ( I receive traffic on 1 port, process and send via the same port. Similar process on 2nd port using 2nd application).
However if my application receives traffic on 1 port and sends it to the 2nd port (internally), and a different application receives traffic on the 2nd port - I can maximum receive only upto 3.4Gbps. Beyond this rate, packets are dropped but imissed count in DPDK statistics are not increased.
Issue in Detail:
I'm running on a server that has an "X710 for 10GbE SFP+ 1572" ethernet controller with 2 ports/ physical functions. I have created 4 virtual functions on each physical function.
Physical function:
0000:08:00.0 'Ethernet Controller X710 for 10GbE SFP+ 1572' if=ens2f0 drv=i40e unused=vfio-pci *Active*
0000:08:00.1 'Ethernet Controller X710 for 10GbE SFP+ 1572' if=ens2f1 drv=i40e unused=vfio-pci *Active*
Machine specification:
CentOS 7.8.2003
Hardware:
Intel(R) Xeon(R) CPU L5520 @ 2.27GHz
L1d cache: 32K, L1i cache: 32K, L2 cache: 256K, L3 cache: 8192K
NIC: X710 for 10GbE SFP+ 1572
RAM: 70Gb
PCI:
Intel Corporation 5520/5500/X58 I/O Hub PCI Express
Capabilities: [90] Express (v2) Root Port (Slot-), MSI 00
LnkSta: Speed 5GT/s, Width x4,
isolcpu: 0,1,2,3,4,5,6,7,8,9,10
NUMA hardware:
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14
node 0 size: 36094 MB
node 1 cpus: 1 3 5 7 9 11 13 15
node 1 size: 36285 MB
Hugepage: size - 2MB and count - 1024
Model-1(Intra VF): Running 2 instances of DPDK l2fwd application namely ApApp1 and ApApp2. ApApp1is bound with 2 VFs and 2 cores, ApApp2 is bound with 1 VF and 1 core.
Traffic handling: App1 receives external traffic on VF-0 and sends it out via VF-1. App2 receives external traffic on VF-2 and sends it out via VF-2 itself.
In this model App1 & App2 together receive 8.8 Gbps and transmit the same without any loss.
Model-2(inter VF): I have modified the l2fwd application App1 to send the external traffic to App2, App2 receives and sends it back to App1 and App1 sends traffic out to the external destination.
Traffic handling: App1 receives external traffic on VF0 and sends it to App2 via VF1. App2 receives packets on VF2 and sends it out to App1 via VF2 itself. App1 receives packets from App2 on VF1 and sends it out to the external destination via VF0
In this model App1 & App2 together receive only 3.5 Gbps and transmit the same without any loss.
If I try to increase the traffic rate, not all packets sent by App1 were received by App2 and vice versa. Please note that there were no increase in the imissed count at port level statistics. (leading to inference that packets were dropped not because of enough cpu cycles but rather in PCI communication between VFs)
I came across following link https://community.intel.com/t5/Ethernet-Products/SR-IOV-VF-performance-and-default-bandwidth-rate-limiting/mp/277795
However for me, in case of intra VF communication there is no issue with the throughput. My limited understanding is the communication between two different Physical Functions would happen via PCI express switch
Is so much deterioration in performance expected(two 10Gbps ports giving throughput of less than 4 Gbps) and hence do I need to change my design?
Could it be because of some misconfiguration? Please suggest any pointers to proceed further.
Based on the analysis of the issue, there seem to be a platform configuration related issue which can cause this effects.
Problem: (Throughput issue) unable to achieve 20Gbps bi-directional (from simulator ingress and application egress via VF) based on maximum receive only upto 3.4Gbps
.
[Solution] This is most likely to following reason
To identify the PCIe lane issue use sudo lscpi -vvvs [PCIe BDF] | gerp Lnk
sudo lscpi -vvvs [PCIe BDF] | gerp Lnk
. Compare for LnkCap
against LnkSta
. If there is a mismatch then its a PCIe lane issue.
[Edit based on live debug] it has been identified indeed the issue was PCIe link. The current Xeon platform only support PCIe gen-2 4x lanes, while X710-T2 card requires PCie gen 3, 4x lanes
.
Recommended in upgrading the CPU and mother board with least Broadwell Xeon or better.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.