[英]Linux Driver and API architecture for a data acquisition device
We're trying to write a driver/API for a custom data acquisition device, which captures several "channels" of data. 我们正在尝试为自定义数据获取设备编写驱动程序/ API,该设备可捕获多个数据“通道”。 For the sake of discussion, let's assume this is a several-channel video capture device. 为了便于讨论,我们假设这是一个多通道视频捕获设备。 The device is connected to the system via an 8xPCIe Gen-1 link, which has a theoretical throughput of 16Gbps. 该设备通过8xPCIe Gen-1链路连接到系统,其理论吞吐量为16Gbps。 Our actual data rate will be around 2.8Gbps (~350MB/sec). 我们的实际数据速率约为2.8Gbps(〜350MB /秒)。
Because of the data rate requirement, we think we have to be careful about the driver/API architecture. 由于数据速率要求,我们认为我们必须对驱动程序/ API体系结构保持谨慎。 We've already implemented a descriptor based DMA mechanism and the associated driver. 我们已经实现了基于描述符的DMA机制和相关的驱动程序。 For example, we can start a DMA transaction for 256KB from the device and it completes successfully. 例如,我们可以从设备启动一个256KB的DMA事务,该事务成功完成。 However, in this implementation we're only capturing the data in the kernel driver, and then dropping it and we aren't streaming the data to the user-space at all. 但是,在此实现中,我们仅在内核驱动程序中捕获数据,然后将其删除,并且根本不将数据流传输到用户空间。 Essentially, this is just a small DMA test implementation. 本质上,这只是一个小的DMA测试实现。
We think we have to separate the problem into three sections: 1. Kernel driver 2. Userspace API 3. User Code 我们认为我们必须将问题分为三个部分:1.内核驱动程序2.用户空间API 3.用户代码
The acquisition device has a register in the PCIe address space which indicates whether there is data to read for any channel from the device. 采集设备在PCIe地址空间中有一个寄存器,该寄存器指示是否有数据要从该设备的任何通道读取。 So, our kernel driver must poll for this bit-vector. 因此,我们的内核驱动程序必须轮询此位向量。 When the kernel driver sees this bit set, it starts a DMA transaction. 当内核驱动程序看到该位置1时,它将启动DMA事务。 The user application however does not need to know about all these DMA transactions and data, until an entire chunk of data is ready (For example, assume that the device provides us with 16 lines of video data per transaction, but we need to notify the user only when the entire video frame is ready). 但是,在准备好整个数据块之前,用户应用程序不需要了解所有这些DMA事务和数据(例如,假定该设备每个事务为我们提供16行视频数据,但是我们需要通知仅当整个视频帧就绪时才可以使用)。 We need to only transfer entire frames to the user application. 我们只需要将整个帧传输到用户应用程序。
Here was our first attempt: 这是我们的首次尝试:
All of the above is working OK, except that the performance is abysmal. 上面的所有方法都可以正常工作,只是性能很差。 We can only achieve about 2MB/sec of transfer rate. 我们只能达到大约2MB /秒的传输速率。 We need to completely re-write this and we're open to any suggestions or pointers to examples. 我们需要完全重写它,我们对任何建议或示例指针都持开放态度。
Other notes: 其他说明:
Unfortunately, we can not change anything in the hardware device. 不幸的是,我们无法更改硬件设备中的任何内容。 So we must poll for the "data-ready" bit and start DMA based on that bit. 因此,我们必须轮询“数据就绪”位并基于该位启动DMA。
Some people suggested to look at Infiniband drivers as a reference, but we're completely lost in that code. 有人建议参考Infiniband驱动程序作为参考,但是我们完全不了解该代码。
You're probably way past this now, but if not here's my 2p. 您现在可能已经过去了,但如果没有,这是我的2分。
You need to write a blocking read, which you supply a large memory buffer to. 您需要写一个阻塞读取,您将为其提供大的内存缓冲区。 The driver read op (a) gets gets a list of user pages for your user buffer and locks them in memory ( get_user_pages
); 读取操作op(a)的驱动程序获取用户缓冲区的用户页面列表,并将其锁定在内存中( get_user_pages
); (b) creates a scatter list with pci_map_sg
; (b)使用pci_map_sg
创建一个分散列表; (c) iterates through the list ( for_each_sg
); (c)遍历列表( for_each_sg
); (d) for each entry writes the corresponding physical bus address and data length to the DMA controller as what I presume you're calling a 'descriptor'. (d)对于每个条目,将相应的物理总线地址和数据长度写入DMA控制器,就像我假设您所说的“描述符”一样。
The card now has a list of descriptors which correspond to the physical bus addresses of your large user buffer. 该卡现在具有一个描述符列表,这些描述符与大型用户缓冲区的物理总线地址相对应。 When data arrives at the card, it writes it directly into user space, into your user buffer, while your user-level read is still blocked. 当数据到达卡时,它将数据直接写到用户空间,用户缓冲区中,而用户级别的读取仍被阻止。 When it has finished the descriptor list, the card has to be able to interrupt, or it's useless. 完成描述符列表后,该卡必须能够中断,否则就没用了。 The driver responds to the interrupt and unblocks your user-level read. 驱动程序响应该中断并取消阻止您的用户级读取。
And that's it. 就是这样。 The details are nasty, of course, and poorly documented, but that should be the basic architecture. 当然,这些细节是令人讨厌的,并且文档记录很少,但这应该是基本的体系结构。 If you really haven't got interrupts you can set up a timer in the kernel to poll for completion of transfer, but if it is really a custom card you should get your money back. 如果您确实没有中断,则可以在内核中设置一个计时器以轮询传输是否完成,但是如果它确实是定制卡,则应该退还您的钱。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.