简体   繁体   中英

Linux PCIe driver has periodic long latency for MSI

I have created a PCIe driver for Linux v4.1.15 (non PREEMPT_RT) with a single IRQ which is generated from an MSI from an FPGA. My ISR is:

static irq_handler int_handler(int irq, void* dev_id, struct pt_regs* regs)
{
    spin_lock(&my_lock);
    msi_counter++;
    spin_unlock(&my_lock);

    return (irq_handler_t) IRQ_HANDLED;
}

The MSI is sent every 300 us from the FPGA (Cyclone V) and my ISR is fired very quickly and is handled without fail (latency < 5 us). The problem is that about every 3 s (3 s with relatively small jitter) the latency for my ISR jumps to about 1.5 ms to 2 ms; this was measured with a scope by changing my ISR to write a value back to the FPGA and monitoring a pin from the FPGA. The spin_lock for msi_counter is only used in one other place in my code but only to decrement the counter in the same way my ISR increments it. I am using an iMX6 quad core CPU at 1 GHz and the system is using a bare-bones Yocto image (core-image-minimal) so nothing is really running on the CPU. The only other hardware the CPU is connected to is the Ethernet but there is very little data being sent and the data updates more frequently than 3 s.

Questions:

  • How can I identify why Linux periodically increases the latency for my ISR?
  • What can I do to decrease the latency?

Other Info:

  • I've changed the flags passed to request_irq() to IRFQ_NO_SUSPEND | IRFQ_NO_THREAD IRFQ_NO_SUSPEND | IRFQ_NO_THREAD , as well as other values, and nothing seems to fix this periodic latency increase.

  • Also, when I look at cat /proc/interrupts it shows that only core 0 (first of four cores) is the only core which my ISR is run on. I don't know if this has any meaning but I figure it is worth mentioning.

  • Data from the FPGA to CPU is transferred once per MSI (once per 300 us) and the time to transfer the data is a steady 17 us. No data is ever sent from the CPU to the FPGA.

Final Solution:

I created a PREEMPT_RT kernel image and created my ISR request_irq() with flags IRQF_NO_SUSPEND | IRQF_NO_THREAD | IRQF_PERCPU IRQF_NO_SUSPEND | IRQF_NO_THREAD | IRQF_PERCPU IRQF_NO_SUSPEND | IRQF_NO_THREAD | IRQF_PERCPU . I also replaced the spin_lock with an atomic_t , the big improvement came from PREEMPT_RT. I now get an average latency of ~12 us.

You can add a timer before and after spin_lock. And if the difference in times is over some threshold, increment another time. You'd count how often the delay is due to spin_lock. If it matches how often you see the delay, then you can try to figure out why the other lock holder doesn't release it. Can it get preempted while holding the lock?

Another thing to look into is should spin_lock itself take a long time with a low, but not-zero, probability?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM