简体   繁体   中英

Measuring amount of CPU time taken by a piece of code, in C on Unix/Linux

Can clock() be used as a dependable API to measure time taken by CPU to execute a snippet of code? When verified usng times() / clock(), both do not seem to measure the CPU time taken precisely.

Firstly, can the APIs clock()/times() be used to measure the execution time of a function/snippet of code, as given in the example below? Is there a better and dependable alternative? The mechanism is to work on Linux, HP-UX, IBM-AIX and Sun Solaris as we need to measure (&& compare) the performance of a piece of code on all these platforms.

Kindly suggest. Also, please let me know if am missing anything trivial.

bbb@m_001:/tmp/kk1$ ./perf_clock 102400
{clock(): S          0 E          0 D    0.0000000000}
bbb@m_001:/tmp/kk1$ ./perf_clock 204800
{clock(): S          0 E      10000 D    0.0100000000}
bbb@m_001:/tmp/kk1$ cat perf_clock.c

#include <stdio.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

void test_clock(char* sbuf, int* len){
    clock_t start, end; int i=0;
    start = clock();
    while(i++ < 500) memset((char*)sbuf,0,*len);
    end = clock();
    printf("{clock(): S %10lu E %10lu D %15.10f}\n",
        start,end,(end-start)/(double) CLOCKS_PER_SEC);
}
int main(int argc,char* argv[])
{
        int len=atoi(argv[1]);
        char *sbuf=(char*)malloc(len);
        test_clock(sbuf,&len);
        free(sbuf); return 0;
}

The results seem to say that to memset() a 100 KB chunk, 500 times, there is no time spent. Or does it say that it is not measurable in microseconds?

Actually, it is not memset() but another function[that prepares a huge structure sized around 1MB, mallocs a copy of this structure, does an Oracle DB select and populate these structures with the data from DB] which am trying to measure. Even this shows 0 ticks, and that is what has kept me confused.

Thanks!

On recent Linux's (*). you can get this information from the /proc filesystem. In the file /proc/PID/stat the 14th entry has the number of jiffies used in userland code and the 15th entry has the number of jiffies used in system code.

If you want to see the data on a per-thread basis, you should reference the file /proc/PID/task/TID/stat instead.

To convert jiffies to microseconds, you can use the following:

define USEC_PER_SEC         1000000UL

long long jiffies_to_microsecond(long long jiffies)
{
    long hz = sysconf(_SC_CLK_TCK);
    if (hz <= USEC_PER_SEC && !(USEC_PER_SEC % hz))
    {
        return (USEC_PER_SEC / hz) * jiffies;
    }
    else if (hz > USEC_PER_SEC && !(hz % USEC_PER_SEC))
    {
        return (jiffies + (hz / USEC_PER_SEC) - 1) / (hz / USEC_PER_SEC);
    }
    else
    {
        return (jiffies * USEC_PER_SEC) / hz;
    }
}

If all you care about is the per-process statistics, getrusage is easier. But if you want to be prepared to do this on a per-thread basis, this technique is better as other then the file name, the code would be identical for getting the data per-process or per-thread.

* - I'm not sure exactly when the stat file was introduced. You will need to verify your system has it.

I would give a try with getrusage and check system and user time.

Also check with gettimeofday to compare with wall clock time.

I would try to correlate the time with the shell's time command, as a sanity check.

You should also consider that the compiler may be optimizing the loop. Since the memset does not depend on the loop variable the compiler will certainly be tempted to apply an optimization known as loop invariant code motion .

I would also caution that a 10MB possibly in-cache clear will really be 1.25 or 2.5 million CPU operations as memset certainly writes in 4-byte or 8-byte quantities. While I rather doubt that this could be done in less than a microsecond, as stores are a bit expensive and 100K adds some L1 cache pressure, you are talking about not much more than one operation per nanosecond, which is not that hard to sustain for a multi-GHz CPU.

One imagines that 600 nS would round off to 1 clock tick, but I would worry about that as well.

you can use clock_t to get the number of CPU ticks since the program started.

Or you can use the linux time command. eg: time [program] [arguments]

Some info here on HP's page about high resolution timers. Also, same trick _Asm_mov_from_ar (_AREG_ITC); used in http://www.fftw.org/cycle.h too.

Have to confirm if this can really be the solution.

Sample prog, as tested on HP-UX 11.31:

bbb@m_001/tmp/prof > ./perf_ticks 1024
ticks-memset {func [1401.000000] inline [30.000000]} noop [9.000000]
bbb@m_001/tmp/prof > cat perf_ticks.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#include "cycle.h" /* one from http://www.fftw.org/cycle.h */
void test_ticks(char* sbuf, int* len){
    memset((char*)sbuf,0,*len);
}
int main(int argc,char* argv[]){
        int len=atoi(argv[1]);
        char *sbuf=(char*)malloc(len);
        ticks t1,t2,t3,t4,t5,t6;
        t1 =getticks(); test_ticks(sbuf,&len); t2 =getticks();
        t3 =getticks(); memset((char*)sbuf,0,len); t4 =getticks();
        t5=getticks();;t6=getticks();
        printf("ticks-memset {func [%llf] inline [%llf]} noop [%llf]\n",
                          elapsed(t2,t1),elapsed(t4,t3),elapsed(t6,t5));
        free(sbuf); return 0;
}
bbb@m_001/tmp/prof >

Resource usage of a process/thread is updated by the OS only periodically. It's entirely possible for a code snippet to complete before the next update thus producing zero resource usage diffs. Can't say anything about HP or AIX, would refer you to Solaris Performance and Tools book for Sun. For Linux you want to look at oprofile and newer perf tool . On the profiling side valgrind would be of much help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM