简体   繁体   English

在内核中使用rdtsc测量执行时间

[英]Measuring execution time using rdtsc within kernel

I am writing a code to measure the time consumption of a sequence of codes in kernel by loading the codes as module into the kernel. 我正在编写代码,以通过将代码作为模块加载到内核中来测量内核中一系列代码的时间消耗。 I uses common rdtsc routine to calculate the time. 我使用通用的rdtsc例程来计算时间。 Interesting thing is similar routine running in user mode results in normal values, whereas the results is always 0 when running in kernel mode, no matter how many lines of codes I have added into the time_count function. 有趣的是,类似的例程在用户模式下运行会产生正常值,而在内核模式下运行时,结果始终为0,无论我在time_count函数中添加了多少行代码。 The calculation I use here is a common matrix product function, and the running cycles should increase rapidly through the increasing of matrix dimension. 我在这里使用的计算是一个常见的矩阵乘积函数,运行周期应随着矩阵维数的增加而迅速增加。 Can anyone point out the mistakes in my code why I could not measure the cycle number in kernel? 谁能指出我的代码中的错误,为什么我无法测量内核中的循环数?

#include <linux/init.h>
#include <linux/module.h>

int matrix_product(){
  int array1[500][500], array2[500][500], array3[500][500];
  int i, j, k, sum;

  for(i = 0; i < 50000; i++){
    for(j = 0; j < 50000; j++){
      array1[i][j] = 5*i + j;
      array2[i][j] = 5*i + j;
    }
  }

  for(i = 0; i < 50000; i++){
    for(j = 0; j < 50000; j++){
      for(k = 0; k < 50000; k++)
    sum += array1[i][k]*array2[k][j];
      array3[i][j] = sum;
      sum = 0;
    }
  }
  return 0;
}

static __inline__ unsigned long long rdtsc(void)
{
 unsigned long hi, lo;
 __asm__ __volatile__ ("xorl %%eax,%%eax\ncpuid" ::: "%rax", "%rbx", "%rcx", "%rdx");
 __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
 return ((unsigned long long)lo) | (((unsigned long long)hi)<<32) ;
}

static int my_init(void)
{
  unsigned long str, end, curr, best, tsc, best_curr;
  long i, t;

#define time_count(codes) for(i=0; i<120000; i++){str=rdtsc(); codes; end=rdtsc(); curr=end-str; if(curr<best)best=curr;}

 best = ~0;
 time_count();
 tsc = best;

 best = ~0;
 time_count(matrix_product());
 best_curr = best;
 printk("<0>matrix product: %lu ticks\n", best_curr-tsc);

 return 0;
}

static void my_exit(void){
  return;
}

module_init(my_init);
module_exit(my_exit);`

Any help is appreciated! 任何帮助表示赞赏! Thanks. 谢谢。

rdtsc is not guaranteed to be available on every CPU, or to run at a constant rate, or be consistent between different cores. 不能保证rdtsc在每个CPU上都可用,或者不能以恒定速率运行,或者在不同内核之间保持一致。

You should use a reliable and portable function like getrawmonotonic unless you have special requirements for the timestamps. 除非对时间戳有特殊要求,否则应使用可靠且可移植的函数,如getrawmonotonic

If you really want to use cycles directly, the kernel already defines get_cycles and cpuid functions for this. 如果您真的想直接使用循环,则内核已经为此定义了get_cyclescpuid函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM