[英]Execution time inconsistency in a program with high priority in the scheduler using RT Kernel
We are trying to implement a program that sends commands to a robot in a given cycle time.我们正在尝试实现一个在给定的周期时间内向机器人发送命令的程序。 Thus this program should be a real-time application.
因此这个程序应该是一个实时应用程序。 We set up a pc with a preempted RT Linux kernel and are launching our programs with chrt -f 98 or chrt -rr 99 to define the scheduling policy and priority.
我们设置了一台带有抢占 RT Linux 内核的 PC,并使用 chrt -f 98 或 chrt -rr 99 启动我们的程序来定义调度策略和优先级。 Loading of the kernel and launching of the program seems to be fine and work (see details below).
加载内核和启动程序似乎很好并且可以正常工作(请参阅下面的详细信息)。
Now we were measuring the time (CPU ticks) it takes our program to be computed.现在我们正在测量计算程序所需的时间(CPU 滴答声)。 We expected this time to be constant with very little variation.
我们预计这个时间是恒定的,变化很小。 What we measured though, were quite significant differences in computation time.
但是,我们测量的是计算时间的显着差异。 Of course, we thought this could be undefined behavior in our rather complex program, so we created a very basic program and measured the time as well.
当然,我们认为这可能是我们相当复杂的程序中未定义的行为,因此我们创建了一个非常基本的程序并测量了时间。 The behavior was similarly bad.
行为同样糟糕。
First of all, we installed an RT Linux Kernel on the PC using this tutorial .首先,我们使用本教程在 PC 上安装了 RT Linux Kernel。 The main characteristics of the PC are:
PC的主要特点是:
PC Characteristics![]() |
Details![]() |
---|---|
CPU![]() |
Intel(R) Atom(TM) Processor E3950 @ 1.60GHz with 4 cores ![]() |
Memory RAM![]() |
8 GB ![]() |
Operating System![]() |
Ubunut 20.04.1 LTS![]() |
Kernel![]() |
Linux 5.9.1-rt20 SMP PREEMPT_RT ![]() |
Architecture![]() |
x86-64 ![]() |
The first time we detected this problem was when we were measuring the time it takes to execute this "complex" program with a single thread.我们第一次发现这个问题是在我们测量用单线程执行这个“复杂”程序所需的时间时。 We did a few tests with this program but also with a simpler one:
我们用这个程序做了一些测试,但也用了一个更简单的:
We also did a latency test on the PC.我们还在 PC 上进行了延迟测试。
For this one, we followed this tutorial , and these are the results:对于这一点,我们遵循了本教程,结果如下:
The processes are shown in htop with a priority of RT进程显示在 htop 中,优先级为 RT
We called the function multiple times in the program and measured the time each takes.我们在程序中多次调用该函数并测量每次所花费的时间。 The results of the 2 tests are:
2次测试的结果是:
From this we observed that:由此我们观察到:
We did the same test but this time with a simpler program:我们做了同样的测试,但这次使用了一个更简单的程序:
#include <vector>
#include <iostream>
#include <time.h>
int main(int argc, char** argv) {
int iterations = 5000;
double a = 5.5;
double b = 5.5;
double c = 4.5;
std::vector<double> wallTime(iterations, 0);
std::vector<double> cpuTime(iterations, 0);
struct timespec beginWallTime, endWallTime, beginCPUTime, endCPUTime;
std::cout << "Iteration | WallTime | cpuTime" << std::endl;
for (unsigned int i = 0; i < iterations; i++) {
// Start measuring time
clock_gettime(CLOCK_REALTIME, &beginWallTime);
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &beginCPUTime);
// Function
a = b + c + i;
// Stop measuring time and calculate the elapsed time
clock_gettime(CLOCK_REALTIME, &endWallTime);
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &endCPUTime);
wallTime[i] = (endWallTime.tv_sec - beginWallTime.tv_sec) + (endWallTime.tv_nsec - beginWallTime.tv_nsec)*1e-9;
cpuTime[i] = (endCPUTime.tv_sec - beginCPUTime.tv_sec) + (endCPUTime.tv_nsec - beginCPUTime.tv_nsec)*1e-9;
std::cout << i << " | " << wallTime[i] << " | " << cpuTime[i] << std::endl;
}
return 0;
}
We understand that:我们明白:
Of course, we can give more details.当然,我们可以提供更多细节。
Thanks a lot for your help!非常感谢你的帮助!
Your function will near certainly be optimized away so you are just measuring how long it takes to read the clocks.你的功能肯定会被优化掉,所以你只是在测量读取时钟需要多长时间。 And as you can see that doesn't take very long with some exceptions:
正如你所看到的,除了一些例外情况不会花费很长时间:
The very first time you run the code (unless you just compiled it) the pages need to be loaded from disk.第一次运行代码时(除非您刚刚编译它),页面需要从磁盘加载。 If you are unlucky the code spans pages and you include the loading of the next page in the measured time.
如果运气不好,代码会跨越页面,并且在测量的时间内包括加载下一页。 Quite unlikely given the code size.
考虑到代码大小,这不太可能。
The first loop the code and any data needs to be loaded into cache.第一个循环代码和任何数据都需要加载到缓存中。 So that takes longer to execute.
所以这需要更长的时间来执行。 The branch predictor might also need a few loops to predict the loop right so the second, third loop might be slightly longer too.
分支预测器可能还需要几个循环来正确预测循环,因此第二个、第三个循环也可能稍长一些。
For everything else I think you can blame scheduling:对于其他一切,我认为你可以责怪调度:
You can do little about IRQs.您对 IRQ 无能为力。 Some you can fix to specific cores but others are just essential (like the timer interrupt for the scheduler itself).
有些你可以修复到特定的内核,但其他的只是必不可少的(比如调度程序本身的定时器中断)。 You kind of just have to live with that.
你只需要忍受它。
But you can fix your program to a specific CPU and you can fix everything else to all the other cores.但是您可以将程序修复到特定的 CPU,您可以将其他所有内容修复到所有其他内核。 Basically reserving the core for the real-time code.
基本上为实时代码保留核心。 I guess you would have to use cgroups for this, to keep everything else off the chosen core.
我想您将不得不为此使用 cgroups,以使其他所有内容远离所选核心。 And you might still get some kernel threads run on the reserved core.
而且您可能仍然会在保留的核心上运行一些内核线程。 Nothing you can do about that.
对此你无能为力。 But that should eliminate most of the large execution times.
但这应该消除大部分大的执行时间。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.