简体   繁体   English

测量线程的上下文切换时间

[英]Measuring context switch time for threads

I want to calculate the context switch time and I am thinking to use mutex and conditional variables to signal between 2 threads so that only one thread runs at a time.我想计算上下文切换时间,我想使用互斥锁和条件变量在 2 个线程之间发出信号,以便一次只运行一个线程。 I can use CLOCK_MONOTONIC to measure the entire execution time and CLOCK_THREAD_CPUTIME_ID to measure how long each thread runs.我可以使用CLOCK_MONOTONIC来测量整个执行时间,使用CLOCK_THREAD_CPUTIME_ID来测量每个线程运行的时间。
Then the context switch time is the (total_time - thread_1_time - thread_2_time) .那么上下文切换时间是(total_time - thread_1_time - thread_2_time) To get a more accurate result, I can just loop over it and take the average.为了获得更准确的结果,我可以循环遍历它并取平均值。

Is this a correct way to approximate the context switch time?这是近似上下文切换时间的正确方法吗? I cant think of anything that might go wrong but I am getting answers that are under 1 nanosecond..我想不出任何可能出错的地方,但我得到的答案不到 1 纳秒。

I forgot to mention that the more time I loop it over and take the average, the smaller results I get.我忘了提到,我循环它并取平均值的时间越长,我得到的结果就越小。

Edit编辑

here is a snippet of the code that I have这是我拥有的代码片段

    typedef struct
    {
      struct timespec start;
      struct timespec end;
    }thread_time;

    ...


    // each thread function looks similar like this
    void* thread_1_func(void* time)
    {
       thread_time* thread_time = (thread_time*) time;

       clock_gettime(CLOCK_THREAD_CPUTIME_ID, &(thread_time->start)); 
       for(x = 0; x < loop; ++x)
       {
         //where it switches to another thread
       }
       clock_gettime(CLOCK_THREAD_CPUTIME_ID, &(thread_time->end));

       return NULL;
   };

   void* thread_2_func(void* time)
   {
      //similar as above
   }

   int main()
   {
      ...
      pthread_t thread_1;
      pthread_t thread_2;

      thread_time thread_1_time;
      thread_time thread_2_time;

      struct timespec start, end;

      // stamps the start time 
      clock_gettime(CLOCK_MONOTONIC, &start);

      // create two threads with the time structs as the arguments 
      pthread_create(&thread_1, NULL, &thread_1_func, (void*) &thread_1_time);
      pthread_create(&thread_2, NULL, &thread_2_func, (void*) &thread_2_time); 
      // waits for the two threads to terminate 
      pthread_join(thread_1, NULL);
      pthread_join(thread_2, NULL);

      // stamps the end time 
      clock_gettime(CLOCK_MONOTONIC, &end);

      // then I calculate the difference between between total execution time and the total execution time of two different threads..
   }

First of all, using CLOCK_THREAD_CPUTIME_ID is probably very wrong;首先,使用CLOCK_THREAD_CPUTIME_ID可能是非常错误的; this clock will give the time spent in that thread, in user mode .这个时钟将给出在用户模式下线程中花费的时间。 However the context switch does not happen in user mode, You'd want to use another clock.但是上下文切换不会在用户模式下发生,您需要使用另一个时钟。 Also, on multiprocessing systems the clocks can give different values from processor to another!此外,在多处理系统上,时钟可以为不同的处理器提供不同的值! Thus I suggest you use CLOCK_REALTIME or CLOCK_MONOTONIC instead.因此,我建议您改用CLOCK_REALTIMECLOCK_MONOTONIC However be warned that even if you read either of these twice in rapid succession, the timestamps usually will tens of nanoseconds apart already.但是请注意,即使您快速连续两次读取其中任何一个,时间戳通常已经相隔数十纳秒。


As for context switches - tthere are many kinds of context switches.至于上下文切换——上下文切换有很多种。 The fastest approach is to switch from one thread to another entirely in software.最快的方法是完全在软件中从一个线程切换到另一个线程。 This just means that you push the old registers on stack, set task switched flag so that SSE/FP registers will be lazily saved, save stack pointer, load new stack pointer and return from that function - since the other thread had done the same, the return from that function happens in another thread.这只是意味着您将旧寄存器压入堆栈,设置任务切换标志,以便延迟保存 SSE/FP 寄存器,保存堆栈指针,加载新堆栈指针并从该函数返回——因为另一个线程也做了同样的事情,该函数的返回发生在另一个线程中。

This thread to thread switch is quite fast, its overhead is about the same as for any system call.这个线程到线程的切换非常快,它的开销与任何系统调用的开销大致相同。 Switching from one process to another is much slower: this is because the user-space page tables must be flushed and switched by setting the CR0 register;从一个进程切换到另一个进程要慢得多:这是因为必须通过设置 CR0 寄存器来刷新和切换用户空间页表; this causes misses in TLB, which maps virtual addresses to physical ones.这会导致 TLB 丢失,将虚拟地址映射到物理地址。


However the <1 ns context switch/system call overhead does not really seem plausible - it is very probable that there is either hyperthreading or 2 CPU cores here, so I suggest that you set the CPU affinity on that process so that Linux only ever runs it on say the first CPU core:然而,<1 ns 上下文切换/系统调用开销似乎并不合理——这里很可能存在超线程或 2 个 CPU 内核,因此我建议您在该进程上设置 CPU 亲和性,以便 Linux 只运行它说第一个 CPU 核心:

#include <sched.h>

cpu_set_t  mask;
CPU_ZERO(&mask);
CPU_SET(0, &mask);
result = sched_setaffinity(0, sizeof(mask), &mask);

Then you should be pretty sure that the time you're measuring comes from a real context switch.那么您应该非常确定您测量的时间来自真实的上下文切换。 Also, to measure the time for switching floating point / SSE stacks (this happens lazily), you should have some floating point variables and do calculations on them prior to context switch, then add say .1 to some volatile floating point variable after the context switch to see if it has an effect on the switching time.此外,衡量切换浮点时间/ SSE栈(这种情况懒洋洋地),你应该有一些浮点变量和上下文切换之前做他们的计算,然后加说.1的一些挥发性浮点变量的上下文切换看看是否对切换时间有影响。

This is not straight forward but as usual someone has already done a lot of work on this.这不是直截了当的,但像往常一样,有人已经在这方面做了很多工作。 (I'm not including the source here because I cannot see any License mentioned) (我没有在这里包括来源,因为我看不到提到的任何许可证)

https://github.com/tsuna/contextswitch/blob/master/timetctxsw.c https://github.com/tsuna/contextswitch/blob/master/timetctxsw.c

If you copy that file to a linux machine as (context_switch_time.c) you can compile and run it using this如果将该文件作为 (context_switch_time.c) 复制到 linux 机器,则可以使用它来编译和运行它

gcc -D_GNU_SOURCE -Wall -O3 -std=c11 -lpthread context_switch_time.c
./a.out

I got the following result on a small VM我在一个小型虚拟机上得到以下结果

2000000  thread context switches in 2178645536ns (1089.3ns/ctxsw)

This question has come up before... for Linux you can find some material here.这个问题之前已经出现过……对于 Linux,你可以在这里找到一些材料。

Write a C program to measure time spent in context switch in Linux OS 编写一个 C 程序来测量 Linux 操作系统中上下文切换所花费的时间

Note, while the user was running the test in the above link they were also hammering the machine with games and compiling which is why the context switches were taking a long time.请注意,当用户在上面的链接中运行测试时,他们也在用游戏敲打机器并进行编译,这就是上下文切换需要很长时间的原因。 Some more info here...这里有更多信息...

how can you measure the time spent in a context switch under java platform java平台下如何测量上下文切换所花费的时间

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM