简体   繁体   English

DPDK 将接收到的数据包转发到默认网络堆栈

[英]DPDK forward received packets to default network stack

We're using DPDK (version 20.08 on ubuntu 20.04, c++ application) to receive UDP packets with a high throughput (>2 Mpps).我们正在使用 DPDK(ubuntu 20.04 上的版本 20.08,c++ 应用程序)来接收具有高吞吐量(>2 Mpps)的 UDP 数据包。 We use a Mellanox ConnectX-5 NIC (and a Mellanox ConnectX-3 in an older system, would be great if the solution worked there aswell).我们使用 Mellanox ConnectX-5 NIC(以及旧系统中的 Mellanox ConnectX-3,如果该解决方案也能在那里工作,那就太好了)。

Contrary, since we only need to send a few configuration messages, we send messages through the default network stack.相反,由于我们只需要发送少量配置消息,因此我们通过默认网络堆栈发送消息。 This way, we can use lots of readily available tools to send configuration messages;这样,我们可以使用很多现成的工具来发送配置消息; however, since all the received data is consumed by DPDK, these tools do not get back any messages.但是,由于所有接收到的数据都被 DPDK 消耗,这些工具不会返回任何消息。

The most prominent issue arises with ARP negotiation: the host tries to resolve addresses, the clients also do respond properly, however, these responses are all consumed by DPDK such that the host cannot resolve the addresses and refuses to send the actual UDP packets.最突出的问题出现在 ARP 协商中:主机尝试解析地址,客户端也正确响应,但是这些响应都被 DPDK 消耗,导致主机无法解析地址并拒绝发送实际的 UDP 数据包。

Our idea would be to filter out the high throughput packets on our application and somehow "forward" everything else (eg ARP responses) to the default network stack.我们的想法是过滤掉应用程序中的高吞吐量数据包,并以某种方式将其他所有内容(例如 ARP 响应)“转发”到默认网络堆栈。 Does DPDK have a built-in solution for that? DPDK 有内置的解决方案吗? I unfortunatelly coulnd't find anything in the examples.不幸的是,我在示例中找不到任何内容。

I've recently heard about the packet function which allows to inject packets into SOCK_DGRAM sockets which may be a possible solution.我最近听说过允许将数据包注入 SOCK_DGRAM 套接字的数据包功能,这可能是一种可能的解决方案。 I also couldn't find a sample implementation for our use-case, though.不过,我也找不到我们用例的示例实现。 Any help is greatly appreciated.任何帮助是极大的赞赏。

Theoretically, if the NIC in question supports the embedded switch feature, it should be possible to intercept the packets of interest in the hardware and redirect them to a virtual function (VF) associated with the physical function (PF), with the PF itself receiving everything else.从理论上讲,如果所讨论的 NIC 支持嵌入式交换机功能,则应该可以拦截硬件中感兴趣的数据包,并将它们重定向到与物理功能 (PF) 关联的虚拟功能 (VF),PF 本身接收其他一切。

  • The user configures SR-IOV feature on the NIC / host as well as virtualisation support;用户在 NIC/主机上配置 SR-IOV 功能以及虚拟化支持;
  • For a given NIC PF, the user adds a VF and binds it to the corresponding Linux driver;对于给定的网卡PF,用户添加一个VF并绑定到对应的Linux驱动;
  • The DPDK application is run with the PF ethdev and a representor ethdev for the VF; DPDK 应用程序使用 PF ethdev 和 VF 的代表ethdev 运行;
  • To handle the packets in question, the application adds the corresponding flow rules.为了处理有问题的数据包,应用程序添加了相应的流规则。

The PF (ethdev 0 ) and the VF representor (ethdev 1 ) have to be explicitly specified by the corresponding EAL argument in the application: -a [pci:dbdf],representor=vf0 . PF (ethdev 0 ) 和 VF 表示器 (ethdev 1 ) 必须由应用程序中相应的 EAL 参数显式指定: -a [pci:dbdf],representor=vf0

As for the flow rules, there should be a pair of such.至于流程规则,应该有一对这样的。

The first rule's components are as follows:第一条规则的组成部分如下:

  • Attribute transfer (demands that matching packets be handled in the embedded switch);属性transfer (要求在嵌入式交换机中处理匹配的数据包);
  • Pattern item REPRESENTED_PORT with port_id = 0 (instructs the NIC to intercept packets coming to the embedded switch from the network port represented by the PF ethdev);模式项REPRESENTED_PORTport_id = 0 (指示 NIC 拦截PF ethdev表示的网络端口进入嵌入式交换机的数据包);
  • Pattern items matching on network headers (these provide narrower match criteria); 网络标头上的模式项匹配(这些提供更窄的匹配标准);
  • Action REPRESENTED_PORT with port_id = 1 (redirects packets to the VF).带有port_id = 1的操作REPRESENTED_PORT (将数据包重定向到 VF)。

In the second rule, item REPRESENTED_PORT has port_id = 1 , and action REPRESENTED_PORT has port_id = 0 (that is, this rule is inverse).第二条规则中,项目REPRESENTED_PORTport_id = 1 ,动作REPRESENTED_PORTport_id = 0 (也就是说,这个规则是相反的)。 Everything else should remain the same.其他一切都应该保持不变。

It is important to note that some drivers do not support item REPRESENTED_PORT at the moment.需要注意的是,一些驱动程序目前不支持REPRESENTED_PORT Instead, they expect that the rules be added via the corresponding ethdevs.相反,他们希望通过相应的 ethdevs 添加规则。 This way, for the provided example: the first rule goes to ethdev 0 , the second one goes to ethdev 1 .这样,对于提供的示例:第一条规则进入 ethdev 0 ,第二条进入 ethdev 1


As per the OP update, the adapter in question might indeed support the embedded switch feature.根据 OP 更新,有问题的适配器可能确实支持嵌入式开关功能。 However, as noted above, item REPRESENTED_PORT might not be supported.但是,如上所述,可能不支持项目REPRESENTED_PORT The rules should be inserted via specific ethdevs.规则应通过特定的 ethdevs 插入。 Also, one more attribute, ingress , might need to be specified.此外,可能还需要指定一个属性ingress

In order to check whether this scheme works, one should be able to deploy a VF (as described above) and run testpmd with the aforementioned EAL argument.为了检查这个方案是否有效,应该能够部署一个 VF(如上所述)并使用上述 EAL 参数运行testpmd In the command line of the application, the two flow rules can be tested as follows:在应用程序的命令行中,可以对两条流规则进行如下测试:

  • flow create 0 ingress transfer pattern eth type is 0x0806 / end actions represented_port ethdev_port_id 1 / end
  • flow create 1 ingress transfer pattern eth type is 0x0806 / end actions represented_port ethdev_port_id 0 / end

Once done, that should pass ARP packets to the VF (thus, to the network interface) in question.完成后,应该将 ARP 数据包传递给相关的 VF(因此,传递给网络接口)。 The rest of packets should be seen by testpmd in active forwarding mode ( start command).其余的数据包应该由testpmd在主动转发模式( start命令)中看到。

NOTE: it is recommended to switch to the most recent DPDK release.注意:建议切换到最新的 DPDK 版本。

For the current use case, the best option is to make use of DPDK TAP PMD (which is part of LINUX DPDK).对于当前的用例,最好的选择是使用 DPDK TAP PMD(它是 LINUX DPDK 的一部分)。 You can use Software or Hardware to filter the specific packets then sent it desired TAP interface.您可以使用软件或硬件来过滤特定的数据包,然后将其发送到所需的 TAP 接口。

A simple example to demonstrate the same would be making use DPDK skeleton example.演示相同的一个简单示例是使用 DPDK skeleton示例。

  1. build the DPDK example via cd [root folder]/example/skeleton; make static通过cd [root folder]/example/skeleton; make static cd [root folder]/example/skeleton; make static
  2. pass the desired Physical DPDK PMD NIC using DPDK eal options ./build/basicfwd -l 1 -w [pcie id of DPDK NIC] --vdev=net_tap0;iface=dpdkTap使用 DPDK eal 选项传递所需的物理 DPDK PMD NIC ./build/basicfwd -l 1 -w [pcie id of DPDK NIC] --vdev=net_tap0;iface=dpdkTap
  3. In second terminal execute ifconfig dpdkTap 0.0.0.0 promisc up在第二个终端执行ifconfig dpdkTap 0.0.0.0 promisc up
  4. Use tpcudmp to capture Ingress and Egress packets using tcpdump -eni dpdkTap -Q in and tcpdump -enu dpdkTap -Q out respectively.使用 tpcudmp 分别使用tcpdump -eni dpdkTap -Q intcpdump -enu dpdkTap -Q out来捕获 Ingress 和 Egress 数据包。

Note: you can configure ip address, setup TC on dpdkTap .注意:您可以配置 ip 地址,在dpdkTap上设置 TC。 Also you can run your custom socket programs too.您也可以运行您的自定义套接字程序。 You do not need to invest time on TLDP, ANS, VPP as per your requirement you just need an mechanism to inject and receive packet from Kernel network stack.您不需要根据您的要求在 TLDP、ANS、VPP 上投入时间,您只需要一种机制来从内核网络堆栈注入和接收数据包。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM