简体   繁体   中英

How to do wall-clock profiling with dtrace? Or, how to count process-not-running samples with profile provider?

I'm developing a plugin for a third-party host application on OSX, using C++. It is compiled as a .dylib. I wish to profile my plugin as it runs in the host application.

Unfortunately the host calls the plugin code at a rate that varies depending on the plugin's (last) execution time. This means that the process's overall time can vary considerably relative to real time. Therefore with a sampling profiler the 'time spent' within the plugin isn't really grounded to anything useful, as it's only compared to stack frames that fall within the process. If I improve the performance of the plugin, then the pattern of the host's execution of the plugin will change accordingly and it will be very difficult to measure improvements within the plugin.

I am able to use Instruments but as far as I can tell I can only get relative time against the process's CPU time.

I've used dtrace to obtain a user stack histogram of the host process:

#!/usr/sbin/dtrace -s

#pragma ustackframes 100
#pragma D option quiet

/* $1 is pid  */
/* $2 is sample rate in Hz (e.g. 100)  */
/* $3 is duration (e.g. '20s')  */

profile-$2
/pid == $1 && arg1/
{
  @[ustack()] = count();
}

tick-$3
{
  exit(0);
}

This works, but it still only provides samples relative to the process time, as the predicate is only matched then the process is in user-space. Even removing the && arg1 condition to trigger it during the process's kernel calls doesn't really help.

What I really want to know is how many profile-n samples resulted in the process not running at all. Then I can compare the number within my plugin against the total number of samples, and get absolute sample values for my plugin's functions. This makes me wonder - is it safe to assume that the requested profile-n sample rate is honoured? Can I simply take time * sample rate and use that to calculate the 'off-process' time? I had assumed that at, say, 1500Hz, it was dropping samples and running at some other, unknown, rate, but if I can be sure it's sampling at 1500Hz then I can work out the 'off-process' time from that.

Alternatively, is there a known way to do wall-clock profiling with dtrace?

This makes me wonder - is it safe to assume that the requested profile-n sample rate is honoured?

On Solaris, it's not guaranteed to be honoured: some old hardware lacks the necessary support for arbitrary-resolution timer-based interrupts. I would assume that the same theoretical limitation applies to OS X's DTrace.

In any case, you can test the timer resolution for yourself. The documentation for the profile provider includes an appropriate script and has a bit more on the topic. Here's another script to address your specific question:

bash-3.2# cat test.d
uint64_t last;

profile-1500
/cpu == 0/
{
    now = timestamp;
    @ = lquantize(now - last, 500000, 800000, 30000);
    last = now;
}   

tick-1
/i++ == 10/
{
    exit(0);
}
bash-3.2# dtrace -qs test.d


           value  ------------- Distribution ------------- count    
          560000 |                                         0        
          590000 |@@@                                      1041     
          620000 |@@@@@@@@@@                               4288     
          650000 |@@@@@@@@@@@@@@                           5680     
          680000 |@@@@@@@@@@                               3999     
          710000 |@@@@                                     1451     
          740000 |                                         0        
          770000 |                                         0        
       >= 800000 |                                         1        

bash-3.2# 

Note that, in practice, you should sample at a frequency that's a prime number: this prevents you from synchronising with other, regularly scheduled, system activity.

Following the discussion in the comments, here's how you can measure the elapsed time spent inside a given function:

pid$target:mylib:myfunc:entry
/!self->depth/
{
    self->depth = ustackdepth;      /* beware recursion */
    self->start_time = timestamp;   /* for relative wall time calculations */
    self->start_vtime = vtimestamp; /* CPU time */      
}

pid$target:mylib:myfunc:return
/ustackdepth == self->depth/
{
    printf("%d ms (real) %d ms (CPU)\n",
        (timestamp - self->start_time) / 1000000,
        (vtimestamp - self->start_vtime) / 1000000);
    self->depth = 0;
}   

If the function is called at a high frequency then clearly you could maintain aggregations of the elapsed times, eg to calculate the average costs of the function.

It's entirely possible to perform a similar exercise for all of the functions in your library, although it can be quite an onerous task to eliminate spurious results from recursion and tail-call optimisation. To be more useful you would probably also want to exclude from a function's cost the time spent calling up the stack; this makes it even harder work (but not impossible). Thus, armed with the means above to create an objective benchmark I would be more inclined to persist with the profiling approach, probably something like

# cat sample.d 
profile-997
/pid == $target && arg1 >= $1 && arg1 < $2/
{
    @[ufunc(arg1)] = count();
}

END
{
    trunc(@,5);
    exit(0);
}
#

This captures the five most-frequently seen functions within a given region of memory. For example (and using pmap on Solaris to locate libc ),

# dtrace -qs sample.d -p `pgrep -n firefox` 0xfc090000 0xfc200000
^C

  libc.so.1`mutex_lock_impl                                        35
  libc.so.1`clear_lockbyte                                         46
  libc.so.1`gettimeofday                                           71
  libc.so.1`memset                                                 73
  libc.so.1`memcpy                                                170
# 

This turns out to be quite a good illustration of the benefit of sampling: memcpy() and memset() are hand-coded in assembly --- ie we find that the most time-consuming functions have already been optimised.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM