Linux设备驱动程序DMA内存缓冲区不是由PCIe硬件按顺序看到的

Question

I'm developing a device driver for a Xilinx Virtex 6 PCIe custom board. 我正在为Xilinx Virtex 6 PCIe定制板开发设备驱动程序。 When doing DMA write (from host to device) here is what happens: 在进行DMA写操作时（从主机到设备），会发生以下情况：

user space app: 用户空间应用：

a. fill buffer with the following byte pattern (tested up to 16kB)
    00 00 .. 00 (64bytes)
    01 01 .. 00 (64bytes)
    ...
    ff ff .. ff (64bytes)
    00 00 .. 00 (64bytes)
    01 01 .. 00 (64bytes)
    etc

b. call custom ioctl to pass pointer to buffer and size

kernel space: 内核空间：

a. retrieve buffer (bufp) with 
    copy_from_user(ptdev->kbuf, bufp, cnt)
b. setup and start DMA 
    b1. //setup physical address
        iowrite32(cpu_to_be32((u32) ptdev->kbuf_dma_addr),
            ptdev->region0 + TDO_DMA_HOST_ADDR);
    b2. //setup transfer size
        iowrite32(cpu_to_be32( ((cnt+3)/4)*4 ), 
            ptdev->region0 + TDO_DMA_BYTELEN);
    b3. //memory barrier to make sure kbuf is in memorry
        mb(); 
    //start dma
    b4. iowrite32(cpu_to_be32(TDO_DMA_H2A | TDO_DMA_BURST_FIXED | TDO_DMA_START),
                ptdev->region0 + TDO_DMA_CTL_STAT);
c. put process to sleep
    wait_res = wait_event_interruptible_timeout(ptdev->dma_queue, 
                            !(tdo_dma_busy(ptdev, &dma_stat)), 
                            timeout);
d. check wait_res result and dma status register and return

Note that the kernel buffer is allocated once at device probe with:
ptdev->kbuf = pci_alloc_consistent(dev, ptdev->kbuf_size, --512kB
                                &ptdev->kbuf_dma_addr);

device pcie TLP dump (obtained through logic analyzer after Xilinx core): 器件pcie TLP转储（通过Xilinx内核后的逻辑分析仪获得）：

a. TLP received (by the device)
 a1. 40000001 0000000F F7C04808 37900000 (MWr corresponds to b1 above)
 a1. 40000001 0000000F F7C0480C 00000FF8 (MWr corresponds to b2 above)
 a1. 40000001 0000000F F7C04800 00010011 (MWr corresponds to b4 above)

b. TLP sent (by the device)
 b1. 00000080 010000FF 37900000 (MRd 80h DW @ addr 37900000h)
 b2. 00000080 010000FF 37900200 (MRd 80h DW @ addr 37900200h)
 b3. 00000080 010000FF 37900400 (MRd 80h DW @ addr 37900400h)
 b4. 00000080 010000FF 37900600 (MRd 80h DW @ addr 37900600h)
...

c. TLP received (by the device)
 c1. 4A000020 00000080 01000000 00 00 .. 00 01 01 .. 01 CplD 128B
 c2. 4A000020 00000080 01000000 02 02 .. 02 03 03 .. 03 CplD 128B
 c3. 4A000020 00000080 01000000 04 04 .. 04 05 05 .. 05 CplD 128B 
 c4. 4A000020 00000080 01000000 06 06 .. 0A 0A 0A .. 0A CplD 128B  <= 
 c5. 4A000010 00000040 01000040 07 07 .. 07             CplD  64B  <= 
 c6. 4A000010 00000040 01000040 0B 0B .. 0B             CplD  64B  <= 
 c7. 4A000020 00000080 01000000 08 08 .. 08 09 09 .. 09 CplD 128B  <= 
 c8. 4A000020 00000080 01000000 0C 0C .. 0C 0D 0D .. 0D CplD 128B 
.. the remaining bytes are transfered correctly and 
the total number of bytes (FF8h) matches the requested size
signal interrupt

Now this apparent memory ordering error happens with high probality (0.8 < p < 1) and the ordering mismatch happens at different random points in the transfer. 现在，这种明显的内存排序错误发生在高概率（0.8 <p <1）并且排序不匹配发生在传输中的不同随机点。

EDIT: Note that the point c4 above would indicate that the memory is not filled in the right order by the kernel driver (I suppose the memory controller fills TLPs with contiguous memory). 编辑：注意上面的点c4表示内核驱动程序没有按正确的顺序填充内存（我想内存控制器用连续的内存填充TLP）。 64B being the cacheline size maybe this has something to do with cache operations. 64B是高速缓存行大小，这可能与高速缓存操作有关。

When I disable cache on the kernel buffer with, 当我在内核缓冲区上禁用缓存时，

echo "base=0xaf180000 size=0x00008000 type=uncachable" > /proc/mtrr

the error still happens but much more seldom (p < 0.1 and depends on transfer size) 错误仍然发生但更少（p <0.1并取决于传输大小）

This only happens on a i7-4770 (Haswell) based machine (tested on 3 identical machine, with 3 boards). 这只发生在基于i7-4770（Haswell）的机器上（在3台相同的机器上测试，有3块板）。 I tried kernel 2.6.32 (RH6.5), stock 3.10.28, and stock 3.13.1 with the same results. 我尝试了内核2.6.32（RH6.5），库存3.10.28和库存3.13.1，结果相同。

I tried the code and device in an i7-610 QM57 based machine and Xeon 5400 machine without any issues. 我在基于i7-610 QM57的机器和Xeon 5400机器上尝试了代码和设备，没有任何问题。

Any ideas/suggestions are welcome. 欢迎任何想法/建议。

Best regards 最好的祝福

Claudio 克劳迪奥

Answer 1

I know this is an old thread, but the reason for the "errors" is completion reordering. 我知道这是一个旧线程，但“错误”的原因是完成重新排序。 Multiple outstanding read requests don't have to be answered in order. 无需按顺序回答多个未完成的读取请求。 Completions are only in order for the same request. 完成仅适用于相同的请求。 On top of that: there is always the same tag assigned to the requests, which is illegal if the requests are active at the same time. 最重要的是：始终为请求分配相同的标记，如果请求同时处于活动状态，则这是非法的。

Answer 2

In the example provided all MemRd TLP have the same TAG. 在提供的示例中，所有MemRd TLP都具有相同的TAG。 You can't use the same TAG while you haven't received the last corresponding CplD with this TAG. 如果您未使用此TAG收到最后一个相应的CplD，则无法使用相同的TAG。 So if you send MemRd, wait until you get CplD with this tag and fire MemRd again all your data will be in order (but in this case bus utilization will be low and you can't get high bandwidth occupation). 因此，如果您发送MemRd，请等到您使用此标签获得CplD并再次激活MemRd所有数据都将按顺序排列（但在这种情况下，总线利用率将会很低，您无法获得高带宽占用率）。

Also read this pci_alloc_consistent uncached memory . 另请阅读此pci_alloc_consistent未缓存的内存。 It doesn't like as a cache issue on your platform. 它不喜欢您平台上的缓存问题。 I would better debug the device core. 我最好调试设备核心。

Answer 3

QM57 supports PCIe 2.0 QM57支持PCIe 2.0

http://www.intel.com/Products/Notebook/Chipsets/QM57/qm57-overview.htm http://www.intel.com/Products/Notebook/Chipsets/QM57/qm57-overview.htm

whereas I imagine the mobo of i7-4770 machine supports PCIe 3.0 而我想i7-4770机器的主板支持PCIe 3.0

http://ark.intel.com/products/75122 http://ark.intel.com/products/75122

I suspect there might be a kind of negotiation failure between PCIe 3.0 mobo and your V6 device (PCIe 2.0, too) 我怀疑PCIe 3.0主板和你的V6设备之间可能存在某种协商失败（PCIe 2.0也是如此）

Linux设备驱动程序DMA内存缓冲区不是由PCIe硬件按顺序看到的

问题描述

3 个解决方案

解决方案1
1 2016-03-16 12:32:20

解决方案2
1 2017-01-24 12:02:25

解决方案3
0 2014-02-07 17:02:58

Linux设备驱动程序DMA内存缓冲区不是由PCIe硬件按顺序看到的

问题描述

3 个解决方案

解决方案1 1 2016-03-16 12:32:20

解决方案2 1 2017-01-24 12:02:25

解决方案3 0 2014-02-07 17:02:58

解决方案1
1 2016-03-16 12:32:20

解决方案2
1 2017-01-24 12:02:25

解决方案3
0 2014-02-07 17:02:58