简体   繁体   English

在PCIE linux内核驱动程序中流式传输DMA

[英]Streaming DMA in PCIE linux kernel driver

I'm working on FPGA driver for Linux kernel. 我正在研究用于Linux内核的FPGA驱动程序。 Code seems to work fine on x86, but on x86_64 I've got some problems. 代码似乎在x86上工作正常,但在x86_64上我遇到了一些问题。 I implemented streaming DMA. 我实现了流式DMA。 So it goes like 所以它就像

get_user_pages(...);
for (...) {
    sg_set_page();
}
pci_map_sg();

But pci_map_sg returned addresses like 0xbd285800 , which are not aligned by PAGE_SIZE , so I can't send full first page, because PCIE specification says pci_map_sg返回的地址如0xbd285800 ,没有通过PAGE_SIZE对齐,所以我无法发送完整的第一页,因为PCIE规范说

"Requests must not specify an Address/Length combination which causes a Memory Space access to cross a 4-KB boundary." “请求不得指定地址/长度组合,这会导致内存空间访问跨越4 KB边界。”

Is there any way to get aligned addresses, or did I just missed something important? 有没有办法获得对齐的地址,还是我错过了一些重要的东西?

Source code of DMA . DMA的源代码

The first possibility that comes to mind is that the user buffer coming in does not start on a page boundary. 想到的第一种可能性是进入的用户缓冲区不会在页面边界上开始。 If your start address is 0x800 bytes through a page, then the offset on your first sg_set_page call will be 0x800. 如果您的起始地址是一个页面的0x800字节,那么第一个sg_set_page调用的偏移量将为0x800。 This will produce a DMA address ending in 0x800. 这将产生一个以0x800结尾的DMA地址。 This is a normal thing to happen, and not a bug. 这是正常的事情,而不是一个错误。

As pci_map_sg coalesces pages, this first segment may be larger than one page. 由于pci_map_sg合并页面,因此第一个段可能大于一个页面。 The important thing is that pci_map_sg produces contiguous blocks of DMA addressable memory, but it does not produce a list of low-level PCIe transactions. 重要的是pci_map_sg产生连续的DMA可寻址内存块,但它不会产生低级PCIe事务列表。 On x64 you are more likely to get a large region, because most x64 platforms have an IOMMU. 在x64上,您更有可能获得一个大区域,因为大多数x64平台都有一个IOMMU。

Many devices I deal with have DMA engines that allow me to specify a logical transfer length of several megabytes. 我处理的许多设备都有DMA引擎,允许我指定几兆字节的逻辑传输长度。 Normally the DMA implementation in the PCIe endpoint is responsible for starting a new PCIe transaction at each 4kB boundary, and the programmer can ignore that constraint. 通常,PCIe端点中的DMA实现负责在每个4kB边界处启动新的PCIe事务,并且程序员可以忽略该约束。 If resources in the FPGA are too limited to handle that, you can consider writing driver code to convert the Linux list of memory blocks into a (much longer) list of PCIe transactions. 如果FPGA中的资源太有限,无法处理,您可以考虑编写驱动程序代码,将Linux内存块列表转换为(更长)PCIe事务列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM