它是一个内核冻结

Question

We are missing interrupts in an linux embedded system having multi-core running at 1.25GHz. 我们在Linux嵌入式系统中缺少中断，其中多核运行在1.25GHz。

Background: 背景：

kernel version: 2.6.32.27 内核版本：2.6.32.27
we have user space processes which need real time performance. 我们有需要实时性能的用户空间流程。
They operate in a 1ms boundary. 它们在1ms的边界内运行。
- That is to say within 1ms they are expected to complete a set of task, which at the max may take about 800uS. 也就是说在1ms内他们应该完成一组任务，最多可能需要800uS左右。
We have a external component FPGA which provides, 1ms and 10ms interrupts to the multi-core processor through GPIO pins configured as edge triggerred interrupts. 我们有一个外部组件FPGA，通过配置为边沿触发中断的GPIO引脚为多核处理器提供1ms和10ms的中断。
These interrupts are handled in kernel driver. 这些中断在内核驱动程序中处理。

The software architecture is in such a way that the user process, after completing its work will do an ioctl to the GPIO driver. 软件体系结构是这样一种方式，即用户进程在完成其工作后将对GPIO驱动程序进行ioctl。

In this ioctl the driver will put the process to wakeup_interruptible state. 在此ioctl中，驱动程序将进程置于wakeup_interruptible状态。 Whenever the next 1ms interrupt is received, the ISR will wakeup the process. 每当接收到下一个1ms中断时，ISR将唤醒该过程。 This cycle repeats. 这个循环重复。

Both the 1ms and 10ms interrupts are routed to a single core of the processor using smp_affinity. 使用smp_affinity将1ms和10ms中断路由到处理器的单个核心。

Problem: 问题：

Sometimes we find that some interrupts are missed. 有时我们发现错过了一些中断。
- (ie ISR itself doesnt get invoked). （即ISR本身不会被调用）。
After 12 to 20ms ISR's are hit normally. 12到20分钟后，ISR正常命中。
This we are able to understand by profiling the duration between consecutive ISR calls, and having counters incremented first thing in the ISR. 我们可以通过分析连续ISR调用之间的持续时间，并使计数器在ISR中首先递增来理解。

This mostly happens during high system load at process level, and is random and hard to reproduce. 这主要发生在过程级别的高系统负载期间，并且是随机的并且难以重现。

I have attached the skeletal code. 我附上了骨架代码。

First I have to isolate whether it is a hardware or software problem. 首先，我必须确定它是硬件还是软件问题。 As it is a FPGA which is giving the interrupts, we dont have much doubt on the hardware. 由于它是一个提供中断的FPGA，我们对硬件没有太多疑问。

Is this kernel freezing? 这个内核冻结了吗？ It is the most likely case since the cpu cycles are incrementing. 这是自cpu周期递增以来最可能发生的情况。

Can it be a case of cpu freeze due to thermal conditions? 可能是由于热条件导致cpu冻结的情况吗？ If so, then the cpu cycles wouldn't have incremented in first place. 如果是这样，那么cpu周期就不会在第一位增加。

Any pointers to debug/isolate the root cause will be of great help considering the kernel version we are working on and the profiling/debugging facilities available in this kernel version. 考虑到我们正在处理的内核版本以及此内核版本中可用的分析/调试工具，任何调试/隔离根本原因的指针都将非常有用。

skeletal code: 骨架代码：

/* Build time Configuration */

/* Macros */
DECLARE_WAIT_QUEUE_HEAD(wait);

/** Structure Definitions */
/** Global Variables */
gpio_dev_t gpio1msDev, gpio10msDev;
GpioIntProfileSectorData_t GpioSigProfileData[MAX_GPIO_INT_CONSUMERS];
GpioIntProfileSectorData_t *ProfilePtrSector;
GpioIntProfileData_t GpioProfileData;
GpioIntProfileData_t *GpioIntProfilePtr;
CurrentTickProfile_t TimeStamp;
uint64_t ModuleInitDone = 0, FirstTimePIDWrite = 0;
uint64_t PrevCycle = 0, NowCycle = 0;
volatile uint64_t TenMsFlag, OneMsFlag;
uint64_t OneMsCounter;
uint64_t OneMsIsrTime, TenMsIsrTime;
uint64_t OneMsCounter, OneMsTime, TenMsTime, SyncStarted;
uint64_t Prev = 0, Now = 0, DiffTen = 0, DiffOne, SesSyncHappened;
static spinlock_t GpioSyncLock = SPIN_LOCK_UNLOCKED;
static spinlock_t IoctlSyncLock = SPIN_LOCK_UNLOCKED;
uint64_t EventPresent[MAX_GPIO_INT_CONSUMERS];

GpioEvent_t CurrentEvent = KERN_NO_EVENT;
TickSyncSes_t *SyncSesPtr = NULL;


/** Function Declarations */

ssize_t write_pid(struct file *filep, const char __user * buf, size_t count, loff_t * ppos);
long Gpio_compat_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);

static const struct file_operations my_fops = {
 write:write_pid,
 compat_ioctl:Gpio_compat_ioctl,
};




/**
 * IOCTL function for GPIO interrupt module
 *
 * @return
 */
long Gpio_compat_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
{
int len = 1, status = 0;
    uint8_t Instance;
    uint64_t *EventPtr;
    GpioIntProfileSectorData_t *SectorProfilePtr, *DebugProfilePtr;
    GpioEvent_t EventToGive = KERN_NO_EVENT;
    pid_t CurrentPid = current->pid;

    spin_lock(&IoctlSyncLock);  // Take the spinlock
    Instance = GetSector(CurrentPid);
    SectorProfilePtr = &GpioSigProfileData[Instance];
    EventPtr = &EventPresent[Instance];
    spin_unlock(&IoctlSyncLock);

    if (Instance <= MAX_GPIO_INT_CONSUMERS)
    {
        switch (cmd)
        {
        case IOCTL_WAIT_ON_EVENT:
            if (*EventPtr)
            {
                /* Dont block here since this is a case where interrupt has happened
                 * before process calling the polling API */
                *EventPtr = 0;
                /* some profiling code */
            }
            else
            {
                status = wait_event_interruptible(wait, (*EventPtr == 1));
                *EventPtr = 0;
            }

            /* profiling code */

            TimeStamp.CurrentEvent = EventToGive;
            len = copy_to_user((char *)arg, (char *)&TimeStamp, sizeof(CurrentTickProfile_t));
            break;
        default:
            break;
        }
    }
    else
    {
        return -EINVAL;
    }

    return 0;
}

/**
 * Send signals to registered PID's.
 *
 * @return
 */
static void WakeupWaitQueue(GpioEvent_t Event)
{
    int i;

    /* some profile code */

    CurrentEvent = Event;

    // we dont wake up debug app hence "< MAX_GPIO_INT_CONSUMERS" is used in for loop
    for (i = 0; i < MAX_GPIO_INT_CONSUMERS; i++)
    {
        EventPresent[i] = 1;
    }
    wake_up_interruptible(&wait);
}

/**
 * 1ms Interrupt handler
 *
 * @return
 */
static irqreturn_t gpio_int_handler_1ms(int irq, void *irq_arg)
{
    uint64_t reg_read, my_core_num;
    unsigned long flags;
    GpioEvent_t event = KERN_NO_EVENT;

    /* code to clear the interrupt registers */


    /************ profiling start************/
    NowCycle = get_cpu_cycle();
    GpioIntProfilePtr->TotalOneMsInterrupts++;

    /* Check the max diff between consecutive interrupts */
    if (PrevCycle)
    {
        DiffOne = NowCycle - PrevCycle;
        if (DiffOne > GpioIntProfilePtr->OneMsMaxDiff)
            GpioIntProfilePtr->OneMsMaxDiff = DiffOne;
    }
    PrevCycle = NowCycle;

    TimeStamp.OneMsCount++; /* increment the counter */

    /* Store the timestamp */

    GpioIntProfilePtr->Gpio1msTimeStamp[GpioIntProfilePtr->IndexOne] = NowCycle;
    TimeStamp.OneMsTimeStampAtIsr = NowCycle;
    GpioIntProfilePtr->IndexOne++;
    if (GpioIntProfilePtr->IndexOne == GPIO_PROFILE_ARRAY_SIZE)
        GpioIntProfilePtr->IndexOne = 0;
    /************ profiling end************/

    /*
     * Whenever 10ms Interrupt happens we send only one event to the upper layers.
     * Hence it is necessary to sync between 1 & 10ms interrupts.
     * There is a chance that sometimes 1ms can happen at first and sometimes 10ms.
     *
     */
    /******** Sync mechanism ***********/
    spin_lock_irqsave(&GpioSyncLock, flags);    // Take the spinlock
    OneMsCounter++;
    OneMsTime = NowCycle;
    DiffOne = OneMsTime - TenMsTime;

    if (DiffOne < MAX_OFFSET_BETWEEN_1_AND_10MS)    //ten ms has happened first
    {
        if (OneMsCounter == 10)
        {
            event = KERN_BOTH_EVENT;
            SyncStarted = 1;
        }
        else
        {
            if (SyncStarted)
            {
                if (OneMsCounter < 10)
                {
                    GpioIntProfilePtr->TickSyncErrAt1msLess++;
                }
                else if (OneMsCounter > 10)
                {
                    GpioIntProfilePtr->TickSyncErrAt1msMore++;
                }
            }
        }
        OneMsCounter = 0;
    }
    else
    {
        if (OneMsCounter < 10)
        {
            if (SyncStarted)
            {
                event = KERN_ONE_MS_EVENT;
            }
        }
        else if (OneMsCounter > 10)
        {
            OneMsCounter = 0;
            if (SyncStarted)
            {
                GpioIntProfilePtr->TickSyncErrAt1msMore++;
            }
        }
    }
    TimeStamp.SFN = OneMsCounter;
    spin_unlock_irqrestore(&GpioSyncLock, flags);
    /******** Sync mechanism ***********/

    if(event != KERN_NO_EVENT)
        WakeupWaitQueue(event);

    OneMsIsrTime = get_cpu_cycle() - NowCycle;
    if (GpioIntProfilePtr->Max1msIsrTime < OneMsIsrTime)
        GpioIntProfilePtr->Max1msIsrTime = OneMsIsrTime;
    return IRQ_HANDLED;
}

/**
 * 10ms Interrupt handler
 *
 * @return
 */
static irqreturn_t gpio_int_handler_10ms(int irq, void *irq_arg)
{
    uint64_t reg_read, my_core_num;
    unsigned long flags;
    GpioEvent_t event = KERN_NO_EVENT;

    /* clear the interrupt */

    /************ profiling start************/
    GpioIntProfilePtr->TotalTenMsInterrupts++;
    Now = get_cpu_cycle();
    if (Prev)
    {
        DiffTen = Now - Prev;
        if (DiffTen > GpioIntProfilePtr->TenMsMaxDiff)
            GpioIntProfilePtr->TenMsMaxDiff = DiffTen;
    }
    Prev = Now;
    TimeStamp.OneMsCount++; /* increment the counter */
    TimeStamp.TenMsCount++;
    GpioIntProfilePtr->Gpio10msTimeStamp[GpioIntProfilePtr->IndexTen] = Now;
    TimeStamp.TenMsTimeStampAtIsr = Now;
    //do_gettimeofday(&TimeOfDayAtIsr.TimeAt10MsIsr);
    GpioIntProfilePtr->IndexTen++;
    if (GpioIntProfilePtr->IndexTen == GPIO_PROFILE_ARRAY_SIZE)
        GpioIntProfilePtr->IndexTen = 0;
    /************ profiling end************/

    /******** Sync mechanism ***********/
    spin_lock_irqsave(&GpioSyncLock, flags);
    TenMsTime = Now;
    DiffTen = TenMsTime - OneMsTime;

    if (DiffTen < MAX_OFFSET_BETWEEN_1_AND_10MS)    //one ms has happened first
    {
        if (OneMsCounter == 10)
        {
            TimeStamp.OneMsTimeStampAtIsr = Now;
            event = KERN_BOTH_EVENT;
            SyncStarted = 1;
        }
        OneMsCounter = 0;
    }
    else
    {
        if (SyncStarted)
        {
            if (OneMsCounter < 9)
            {
                GpioIntProfilePtr->TickSyncErrAt10msLess++;
                OneMsCounter = 0;
            }
            else if (OneMsCounter > 9)
            {
                GpioIntProfilePtr->TickSyncErrAt10msMore++;
                OneMsCounter = 0;
            }
        }
        else
        {
            if (OneMsCounter != 9)
                OneMsCounter = 0;
        }
    }
    TimeStamp.SFN = OneMsCounter;
    spin_unlock_irqrestore(&GpioSyncLock, flags);
    /******** Sync mechanism ***********/

    if(event != KERN_NO_EVENT)
        WakeupWaitQueue(event);

    TenMsIsrTime = get_cpu_cycle() - Now;
    if (GpioIntProfilePtr->Max10msIsrTime < TenMsIsrTime)
        GpioIntProfilePtr->Max10msIsrTime = TenMsIsrTime;

    return IRQ_HANDLED;
}

Answer 1

Reseting EventPresent after waiting the event in wait_event_interruptible() 正在重置 EventPresent在等待事件发生后 wait_event_interruptible()

EventPtr = &EventPresent[Instance];
...
status = wait_event_interruptible(wait, (*EventPtr == 1));
*EventPtr = 0;

looks suspicious . 看起来很可疑。

If WakeupWaitQueue() will be executed concurrently, then setting of the event 如果WakeupWaitQueue()将同时执行，则设置事件

for (i = 0; i < MAX_GPIO_INT_CONSUMERS; i++)
    {
        EventPresent[i] = 1;
    }
wake_up_interruptible(&wait);

will be lost. 会迷路。

It is better to have two independent counters for raised events and for processed events: 对于引发事件和已处理事件，最好有两个独立的计数器 ：

uint64_t EventPresent[MAX_GPIO_INT_CONSUMERS]; // Number if raised events
uint64_t EventProcessed[MAX_GPIO_INT_CONSUMERS]; // Number of processed events

In that case condition could be a comparision of these counters: 在这种情况下，条件可以是这些计数器的比较：

Gpio_compat_ioctl()
{
    ...
    EventPresentPtr = &EventPresent[Instance];
    EventProcessedPtr = &EventProcessed[Instance];
    ...
    status = wait_event_interruptible(wait, (*EventPresentPtr != *EventProcessedPtr));
    (*EventProcessedPtr)++;
    ...
}

WakeupWaitQueue()
{
    ...
    for (i = 0; i < MAX_GPIO_INT_CONSUMERS; i++)
    {
        EventPresent[i]++;
    }
    wake_up_interruptible(&wait);
}

Answer 2

This was not a kernel freeze. 这不是内核冻结。 We had a free core in the system which was running baremetal. 我们在系统中有一个免费核心，运行裸机。 We routed the 1ms interrupts to this baremetal core as well. 我们也将1ms中断路由到这个裸金属核心。 When the issue occurs, we compared with the baremetal core profile info. 当问题发生时，我们将与裸金属核心配置文件信息进行比较。 In baremetal core ISRs were hit properly linear to the time elapsed. 在裸金属核心中，ISR在经过的时间内被恰当地线性地击中。 By this we ruled out that there are not HW issues or thermal issues. 通过这个我们排除了没有硬件问题或热问题。

Next on close look of the code, we started suspecting whether spinlock is causing to miss the interrupts. 接下来仔细查看代码，我们开始怀疑spinlock是否导致错过中断。 To experiment, we changed the logic to run the ISRs without spinlock. 为了实验，我们更改了逻辑以在没有自旋锁的情况下运行ISR。 Now we see that there are not missed interrupts. 现在我们看到没有错过中断。

So the issues seems solved, however with spinlock present also the system was working properly under normal load conditions. 所以这些问题似乎已经解决了，但是当螺旋锁存在时，系统也能在正常负载条件下正常工作。 This issue arises only during very high CPU load. 此问题仅在非常高的CPU负载期间出现。 This is something for which i dont have an answer though ie only during high load condition, why calling spinlock makes the other interrupt to be missed. 这是我没有答案的东西，即仅在高负载条件下，为什么调用自旋锁使得其他中断被遗漏。

它是一个内核冻结

问题描述

2 个解决方案

解决方案1
2 2016-11-25 20:02:13

解决方案2
1 已采纳 2016-12-13 11:05:38

它是一个内核冻结

问题描述

2 个解决方案

解决方案1 2 2016-11-25 20:02:13

解决方案2 1 已采纳 2016-12-13 11:05:38

解决方案1
2 2016-11-25 20:02:13

解决方案2
1 已采纳 2016-12-13 11:05:38