[英]DPDK forward received packets to default network stack
We're using DPDK (version 20.08 on ubuntu 20.04, c++ application) to receive UDP packets with a high throughput (>2 Mpps).我们正在使用 DPDK(ubuntu 20.04 上的版本 20.08,c++ 应用程序)来接收具有高吞吐量(>2 Mpps)的 UDP 数据包。 We use a Mellanox ConnectX-5 NIC (and a Mellanox ConnectX-3 in an older system, would be great if the solution worked there aswell).
我们使用 Mellanox ConnectX-5 NIC(以及旧系统中的 Mellanox ConnectX-3,如果该解决方案也能在那里工作,那就太好了)。
Contrary, since we only need to send a few configuration messages, we send messages through the default network stack.相反,由于我们只需要发送少量配置消息,因此我们通过默认网络堆栈发送消息。 This way, we can use lots of readily available tools to send configuration messages;
这样,我们可以使用很多现成的工具来发送配置消息; however, since all the received data is consumed by DPDK, these tools do not get back any messages.
但是,由于所有接收到的数据都被 DPDK 消耗,这些工具不会返回任何消息。
The most prominent issue arises with ARP negotiation: the host tries to resolve addresses, the clients also do respond properly, however, these responses are all consumed by DPDK such that the host cannot resolve the addresses and refuses to send the actual UDP packets.最突出的问题出现在 ARP 协商中:主机尝试解析地址,客户端也正确响应,但是这些响应都被 DPDK 消耗,导致主机无法解析地址并拒绝发送实际的 UDP 数据包。
Our idea would be to filter out the high throughput packets on our application and somehow "forward" everything else (eg ARP responses) to the default network stack.我们的想法是过滤掉应用程序中的高吞吐量数据包,并以某种方式将其他所有内容(例如 ARP 响应)“转发”到默认网络堆栈。 Does DPDK have a built-in solution for that?
DPDK 有内置的解决方案吗? I unfortunatelly coulnd't find anything in the examples.
不幸的是,我在示例中找不到任何内容。
I've recently heard about the packet function which allows to inject packets into SOCK_DGRAM sockets which may be a possible solution.我最近听说过允许将数据包注入 SOCK_DGRAM 套接字的数据包功能,这可能是一种可能的解决方案。 I also couldn't find a sample implementation for our use-case, though.
不过,我也找不到我们用例的示例实现。 Any help is greatly appreciated.
任何帮助是极大的赞赏。
Theoretically, if the NIC in question supports the embedded switch feature, it should be possible to intercept the packets of interest in the hardware and redirect them to a virtual function (VF) associated with the physical function (PF), with the PF itself receiving everything else.从理论上讲,如果所讨论的 NIC 支持嵌入式交换机功能,则应该可以拦截硬件中感兴趣的数据包,并将它们重定向到与物理功能 (PF) 关联的虚拟功能 (VF),PF 本身接收其他一切。
The PF (ethdev 0
) and the VF representor (ethdev 1
) have to be explicitly specified by the corresponding EAL argument in the application: -a [pci:dbdf],representor=vf0
. PF (ethdev
0
) 和 VF 表示器 (ethdev 1
) 必须由应用程序中相应的 EAL 参数显式指定: -a [pci:dbdf],representor=vf0
。
As for the flow rules, there should be a pair of such.至于流程规则,应该有一对这样的。
The first rule's components are as follows:第一条规则的组成部分如下:
transfer
(demands that matching packets be handled in the embedded switch);transfer
(要求在嵌入式交换机中处理匹配的数据包);REPRESENTED_PORT
with port_id = 0
(instructs the NIC to intercept packets coming to the embedded switch from the network port represented by the PF ethdev);REPRESENTED_PORT
, port_id = 0
(指示 NIC 拦截从PF ethdev表示的网络端口进入嵌入式交换机的数据包);REPRESENTED_PORT
with port_id = 1
(redirects packets to the VF).port_id = 1
的操作REPRESENTED_PORT
(将数据包重定向到 VF)。 In the second rule, item REPRESENTED_PORT
has port_id = 1
, and action REPRESENTED_PORT
has port_id = 0
(that is, this rule is inverse).在第二条规则中,项目
REPRESENTED_PORT
的port_id = 1
,动作REPRESENTED_PORT
的port_id = 0
(也就是说,这个规则是相反的)。 Everything else should remain the same.其他一切都应该保持不变。
It is important to note that some drivers do not support item REPRESENTED_PORT
at the moment.需要注意的是,一些驱动程序目前不支持
REPRESENTED_PORT
项。 Instead, they expect that the rules be added via the corresponding ethdevs.相反,他们希望通过相应的 ethdevs 添加规则。 This way, for the provided example: the first rule goes to ethdev
0
, the second one goes to ethdev 1
.这样,对于提供的示例:第一条规则进入 ethdev
0
,第二条进入 ethdev 1
。
As per the OP update, the adapter in question might indeed support the embedded switch feature.根据 OP 更新,有问题的适配器可能确实支持嵌入式开关功能。 However, as noted above, item
REPRESENTED_PORT
might not be supported.但是,如上所述,可能不支持项目
REPRESENTED_PORT
。 The rules should be inserted via specific ethdevs.规则应通过特定的 ethdevs 插入。 Also, one more attribute,
ingress
, might need to be specified.此外,可能还需要指定一个属性
ingress
。
In order to check whether this scheme works, one should be able to deploy a VF (as described above) and run testpmd
with the aforementioned EAL argument.为了检查这个方案是否有效,应该能够部署一个 VF(如上所述)并使用上述 EAL 参数运行
testpmd
。 In the command line of the application, the two flow rules can be tested as follows:在应用程序的命令行中,可以对两条流规则进行如下测试:
flow create 0 ingress transfer pattern eth type is 0x0806 / end actions represented_port ethdev_port_id 1 / end
flow create 1 ingress transfer pattern eth type is 0x0806 / end actions represented_port ethdev_port_id 0 / end
Once done, that should pass ARP packets to the VF (thus, to the network interface) in question.完成后,应该将 ARP 数据包传递给相关的 VF(因此,传递给网络接口)。 The rest of packets should be seen by
testpmd
in active forwarding mode ( start
command).其余的数据包应该由
testpmd
在主动转发模式( start
命令)中看到。
NOTE: it is recommended to switch to the most recent DPDK release.注意:建议切换到最新的 DPDK 版本。
For the current use case, the best option is to make use of DPDK TAP PMD (which is part of LINUX DPDK).对于当前的用例,最好的选择是使用 DPDK TAP PMD(它是 LINUX DPDK 的一部分)。 You can use Software or Hardware to filter the specific packets then sent it desired TAP interface.
您可以使用软件或硬件来过滤特定的数据包,然后将其发送到所需的 TAP 接口。
A simple example to demonstrate the same would be making use DPDK skeleton
example.演示相同的一个简单示例是使用 DPDK
skeleton
示例。
cd [root folder]/example/skeleton; make static
cd [root folder]/example/skeleton; make static
cd [root folder]/example/skeleton; make static
./build/basicfwd -l 1 -w [pcie id of DPDK NIC] --vdev=net_tap0;iface=dpdkTap
./build/basicfwd -l 1 -w [pcie id of DPDK NIC] --vdev=net_tap0;iface=dpdkTap
ifconfig dpdkTap 0.0.0.0 promisc up
ifconfig dpdkTap 0.0.0.0 promisc up
tcpdump -eni dpdkTap -Q in
and tcpdump -enu dpdkTap -Q out
respectively.tcpdump -eni dpdkTap -Q in
和tcpdump -enu dpdkTap -Q out
来捕获 Ingress 和 Egress 数据包。 Note: you can configure ip address, setup TC on dpdkTap
.注意:您可以配置 ip 地址,在
dpdkTap
上设置 TC。 Also you can run your custom socket programs too.您也可以运行您的自定义套接字程序。 You do not need to invest time on TLDP, ANS, VPP as per your requirement you just need an mechanism to inject and receive packet from Kernel network stack.
您不需要根据您的要求在 TLDP、ANS、VPP 上投入时间,您只需要一种机制来从内核网络堆栈注入和接收数据包。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.