当调度策略为 SCHED_RR 时，导致 pthread 临界区运行时间峰值的原因可能是什么？

Question

I am doing some time calculation tests in Linux.我正在 Linux 中做一些时间计算测试。 My kernel is Preempt-RT (however vanilla kernel gives similar results in my tests...)我的内核是 Preempt-RT（但是 vanilla 内核在我的测试中给出了类似的结果......）

I have two pthreads, running concurently in the same processor (affinitty given).我有两个 pthread，在同一个处理器中同时运行（给定的亲和性）。 They are real-time threads (prio 99).它们是实时线程（prio 99）。

I have a critical section protected by a spin lock, where two threads compete for the lock.我有一个受自旋锁保护的临界区，其中两个线程争夺锁。 Inside the critical section, I have a single increment operation and I try to calculate the elapsed time for this operation.在临界区中，我有一个增量操作，我尝试计算此操作所用的时间。

Code example with __rdtsc: __rdtsc 的代码示例：

pthread_spin_lock(&lock);

start_time = __rdtsc();
++cnt; //shared ram variable, type is unsigned long long
stop_time = __rdtsc();

pthread_spin_unlock(&lock);

Code example with chrono timer:带有计时计时器的代码示例：

pthread_spin_lock(&lock);

auto _start_time = std::chrono::high_resolution_clock::now();
++cnt; //shared ram variable, type is unsigned long long
auto _stop_time = std::chrono::high_resolution_clock::now();

pthread_spin_unlock(&lock);

Threads run in loop for a couple of million times and then terminate.线程在循环中运行几百万次然后终止。 After unlocking the spin lock, I log the the average elapsed time and the maximum elapsed time.解锁自旋锁后，我记录平均经过时间和最大经过时间。

Now, here the things go interesting (at least for me):现在，事情变得有趣了（至少对我而言）：

Test 1: Threads have the scheduling policy as SCHED_RR :测试 1：线程的调度策略为SCHED_RR ：

Thread no: 0, Max Time: 34124, Avg Time: 28.114271, Run Cnt: 10000000线程编号：0，最大时间：34124，平均时间：28.114271，运行次数：10000000

Thread no: 1, Max Time: 339256976 , Avg Time: 74.781960, Run Cnt: 10000000线程数：1，最大时间： 339256976 ，平均时间：74.781960，运行次数：10000000

Test 2: Threads have the scheduling policy as SCHED_FIFO :测试 2：线程的调度策略为SCHED_FIFO ：

Thread no: 0, Max Time: 33114, Avg Time: 48.414173, Run Cnt: 10000000线程编号：0，最大时间：33114，平均时间：48.414173，运行次数：10000000

Thread no: 1, Max Time: 38637, Avg Time: 24.327742, Run Cnt: 10000000线程数：1，最大时间：38637，平均时间：24.327742，运行次数：10000000

Test 3: Only single thread, the scheduling policy is SCHED_RR:测试3：只有单线程，调度策略为SCHED_RR：

Thread no: 0, Max Time: 34584, Avg Time: 54.165470, Run Cnt: 10000000线程数：0，最大时间：34584，平均时间：54.165470，运行次数：10000000

Note: Main thread is a non-rt thread which has an affinity in a seperate processor.注意：主线程是一个非 rt 线程，它在单独的处理器中具有亲缘关系。 It has no important here.这里没有什么重要的。

Note1: All tests give approx.注1：所有测试都给出了大约。 similar results everytime I run them.每次我运行它们时都会得到类似的结果。

Note2: The results given are output of the rdtsc.注 2：给出的结果是 rdtsc 的输出。 However, the chrono timer results are nearly similar to these.然而，计时计时器的结果几乎与这些相似。

So I think I may have a mis-understanding of the scheduler, so I need to ask these questions:所以我想我可能对调度器有误解，所以我需要问这些问题：

How does the huge maximum time spikes occur in the test 1?测试 1 中如何出现巨大的最大时间尖峰？ Test 2&3 don't behave like it...测试 2 和 3 的行为不像这样......
Why is there a very large gap between the maximum and the average calculations?为什么最大值和平均值计算之间有很大的差距？ What causes this, an interrupt like timer?是什么导致了这个，像定时器这样的中断？

My all test code is:我的所有测试代码是：

#include <stdio.h>
#include <stdlib.h>
#include "stdint.h"
#include <float.h>
#include <pthread.h>
#include <cxxabi.h>
#include <limits.h>
#include <sched.h>
#include <sys/mman.h>
#include <unistd.h> 
#include <sys/time.h> 
#include <sys/resource.h> 
#include <malloc.h>
#include <chrono>

/********* TEST CONFIG ************/

#define TEST_PTHREAD_RUN_CNT    10000000    //1000000000
#define NUM_OF_TEST_PTHREADS    2
#define MAIN_THREAD_CORE_INDEX  0
#define TEST_PTHREAD_PRIO       99
#define TEST_PTHREAD_POLICY     SCHED_RR

#define TIME_RDTSC              1
#define TIME_CHRONO             0
/**********************************/

/**********************************/
struct param_list_s
 {
    unsigned int thread_no;
 };
/**********************************/

/********* PROCESS RAM ************/
pthread_t threads[NUM_OF_TEST_PTHREADS];
struct param_list_s param_list[NUM_OF_TEST_PTHREADS];
unsigned long long max_time[NUM_OF_TEST_PTHREADS];
unsigned long long _max_time[NUM_OF_TEST_PTHREADS];
unsigned long long tot_time[NUM_OF_TEST_PTHREADS];
unsigned long long _tot_time[NUM_OF_TEST_PTHREADS];
unsigned long long run_cnt[NUM_OF_TEST_PTHREADS];
unsigned long long cnt;
pthread_spinlock_t lock;
/**********************************/

/*Proto*/
static void configureMemoryBehavior(void);
void create_rt_pthread(unsigned int thread_no);

/*
* Date............: 
* Function........: main
* Description.....: 
*/
int main(void)
{
    cpu_set_t  mask;
    int i;

    for (i = 0; i < NUM_OF_TEST_PTHREADS; ++i)
     {
        max_time[i] = 0;
        tot_time[i] = 0;
        run_cnt[i] = 0;

        _max_time[i] = 0;
        _tot_time[i] = 0;
     }
    cnt = 0;

    printf("\nSetting scheduler affinity for the process...");
    CPU_ZERO(&mask);
    CPU_SET(MAIN_THREAD_CORE_INDEX, &mask);
    sched_setaffinity(0, sizeof(mask), &mask);
    printf("done.\n");

    configureMemoryBehavior();

    pthread_spin_init(&lock, PTHREAD_PROCESS_PRIVATE);

    for (i = 0; i < NUM_OF_TEST_PTHREADS; ++i)
     {
        create_rt_pthread(i);
     }

    printf("Waiting threads to join\n...\n");
    for (i = 0; i < NUM_OF_TEST_PTHREADS; i++)
    {
        pthread_join(threads[i], NULL);
        #if(TIME_RDTSC == 1)
        printf("Thread no: %d, Max Time: %llu, Avg Time: %f, Run Cnt: %llu\n", i, max_time[i], (float)((float)tot_time[i] / run_cnt[i]), run_cnt[i]);
        #endif

        #if(TIME_CHRONO == 1)
        printf("Thread no: %d, Max Time: %lu, Avg Time: %f, Run Cnt: %lu\n", i, _max_time[i], (float)((float)_tot_time[i] / run_cnt[i]), run_cnt[i]);
        #endif
    }
    printf("All threads joined\n");
    printf("Shared Cnt: %llu\n", cnt);

    return 0;
}


/*
* Date............:
* Function........: thread_func
* Description.....:
*/
void *thread_func(void *argv)
{

    unsigned long long i, start_time, stop_time, latency = 0;
    unsigned int thread_no;

    thread_no = ((struct param_list_s *)argv)->thread_no;
    i = 0;
    while (1)
     {
        #if(TIME_RDTSC == 1)
        pthread_spin_lock(&lock);
        start_time = __rdtsc();
        ++cnt;
        stop_time = __rdtsc();
        pthread_spin_unlock(&lock);

        if (stop_time > start_time)
        {
            latency = stop_time - start_time;
            ++run_cnt[thread_no];

            tot_time[thread_no] += latency;
            if (latency > max_time[thread_no])
                max_time[thread_no] = latency;
        }
        #endif

        #if(TIME_CHRONO == 1)
        pthread_spin_lock(&lock);

        auto _start_time = std::chrono::high_resolution_clock::now();
        ++cnt;
        auto _stop_time = std::chrono::high_resolution_clock::now();

        pthread_spin_unlock(&lock);

        auto __start_time = std::chrono::duration_cast<std::chrono::nanoseconds>(_start_time.time_since_epoch()).count();
        auto __stop_time = std::chrono::duration_cast<std::chrono::nanoseconds>(_stop_time.time_since_epoch()).count();
        auto __latency = __stop_time - __start_time;

        if (__stop_time > __start_time)
        {
            _tot_time[thread_no] += __latency;
            ++run_cnt[thread_no];
            if (__latency > _max_time[thread_no])
            {
                _max_time[thread_no] = __latency;
            }
        }
        #endif

        if (++i >= TEST_PTHREAD_RUN_CNT)
            break;
     }

    return 0;
}


/*
* Date............:
* Function........: create_rt_pthread
* Description.....:
*/
void create_rt_pthread(unsigned int thread_no)
{

    struct sched_param  param;
    pthread_attr_t      attr;

    printf("Creating a new real-time thread\n");
    /* Initialize pthread attributes (default values) */
    pthread_attr_init(&attr);

    /* Set a specific stack size  */
    pthread_attr_setstacksize(&attr, PTHREAD_STACK_MIN);

    /* Set scheduler policy and priority of pthread */
    pthread_attr_setschedpolicy(&attr, TEST_PTHREAD_POLICY);
    param.sched_priority = TEST_PTHREAD_PRIO;
    pthread_attr_setschedparam(&attr, &param);

    /* Set the processor affinity*/
    cpu_set_t cpuset;
    CPU_ZERO(&cpuset);
    CPU_SET(1, &cpuset);

    pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpuset);

    /* Use scheduling parameters of attr */
    pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);

    param_list[thread_no].thread_no = thread_no;

    if(pthread_create(&threads[thread_no], &attr, thread_func, (void *)&param_list[thread_no]) != 0)
     {
        printf("Thread could not be created.\n");
        exit(-1);
     }
}


/*
* Date............:
* Function........: configureMemoryBehavior
* Description.....:
*/
static void configureMemoryBehavior(void)
{
    printf("\nLocking memory...");
    /* Now lock all current and future pages
       from preventing of being paged */
    if (mlockall(MCL_CURRENT | MCL_FUTURE))
        perror("mlockall failed:");

    /* Turn off malloc trimming.*/
    mallopt(M_TRIM_THRESHOLD, -1);

    /* Turn off mmap usage. */
    mallopt(M_MMAP_MAX, 0);
    printf("done.\n");
}

Answer 1

When you run with SCHED_FIFO , one of your threads starts running.当您使用SCHED_FIFO运行时，您的一个线程开始运行。 It then runs until it's finished -- because that's how SCHED_FIFO works -- nothing will preempt it.然后它一直运行直到完成——因为这就是SCHED_FIFO工作方式——没有任何东西可以抢占它。 The time it spends within the spinlock therefore is relatively consistent.因此它在自旋锁中花费的时间是相对一致的。 Then, after the first thread is finished, the second thread runs to completion without contention for the lock.然后，在第一个线程完成后，第二个线程运行完成而不争用锁。 So it too has a more consistent time.所以它也有更一致的时间。 There is still some jitter in both due to interrupts and so forth but that is fairly consistent between the two.由于中断等原因，两者仍然存在一些抖动，但两者之间相当一致。

When you run with SCHED_RR , one of your threads runs for a while.当您使用SCHED_RR运行时，您的一个线程会运行一段时间。 At the end of a time slice, it gets pre-empted and the other one will get to run -- because that's how SCHED_RR works.在一个时间片的末尾，它被抢占而另一个将开始运行——因为这就是SCHED_RR工作方式。 Now, there's a good chance it gets pre-empted while holding the spinlock .现在，它很有可能在持有 spinlock 时被抢占。 So, now the other thread is running, it immediately attempts to grab the spinlock, which fails -- because the other thread holds the lock.所以，现在另一个线程正在运行，它立即尝试获取自旋锁，但失败了——因为另一个线程持有锁。 But it just keeps trying until the end of the time slice (because that's how spinlocks work -- it won't ever block waiting to acquire the lock).但它会一直尝试直到时间片结束（因为这就是自旋锁的工作方式——它永远不会阻塞等待获取锁）。 Of course it accomplishes nothing during this time.当然，在此期间它什么也做不了。 Eventually, the time-slice ends, and the thread holding the lock gets to run again.最终，时间片结束，持有锁的线程再次运行。 But the time attributed to that single increment operation now includes all that time waiting for the other thread to spin throughout its time-slice.但是归因于单个增量操作的时间现在包括等待另一个线程在其整个时间片中旋转的所有时间。

I think if you increase the maximum count ( TEST_PTHREAD_RUN_CNT ), you'll see that the SCHED_RR behavior evens out as both of your threads eventually get subjected to this effect.我认为，如果增加最大计数（ TEST_PTHREAD_RUN_CNT ），你会看到SCHED_RR行为找齐了既是你的线程最终会受到这种影响。 Right now, I'm guessing there's a good chance that one thread can pretty much finish within one or two time slices.现在，我猜一个线程很有可能在一两个时间片内完成。

If you want to lock out another thread running with an equivalent priority on the same processor, you should probably be using a pthread_mutex_t .如果您想锁定在同一处理器上以相同优先级运行的另一个线程，您可能应该使用pthread_mutex_t 。 That will act pretty much the same as a spinlock in the successful acquisition case, but will block when it can't acquire the lock.在成功获取的情况下，它的作用与自旋锁几乎相同，但在无法获取锁时会阻塞。

But then note: the result of that might well turn the SCHED_RR behavior into the SCHED_FIFO behavior: most of the time, the pre-emption will happen while one thread has the lock held, so the other one will get to run for a few instructions until it attempts to acquire the lock, then it will block and the first will get to run again for a full time-slice.但是请注意：结果很可能将SCHED_RR行为转变为SCHED_FIFO行为：大多数情况下，抢占会在一个线程持有锁时发生，因此另一个线程将运行几条指令直到它尝试获取锁，然后它才会阻塞，第一个将再次运行一个完整的时间片。

Overall, it's really dicey to attempt to run two RT priority threads on one processor where both of them are expected to run for long periods of time.总体而言，尝试在一个处理器上运行两个 RT 优先级线程确实很冒险，而这两个线程都预计会运行很长时间。 RT priority will work best where you lock each thread to its own core, or where the RT threads need to get scheduled immediately, but will only run for a short time before blocking again. RT 优先级最适合将每个线程锁定到其自己的核心，或者 RT 线程需要立即安排，但在再次阻塞之前只会运行很短的时间。

当调度策略为 SCHED_RR 时，导致 pthread 临界区运行时间峰值的原因可能是什么？

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-03-26 15:46:02

当调度策略为 SCHED_RR 时，导致 pthread 临界区运行时间峰值的原因可能是什么？

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-03-26 15:46:02

解决方案1
1 已采纳 2019-03-26 15:46:02