使用 DTrace 分析 FreeBSD 内核

Question

I'm looking to improve interface destruction time with FreeBSD.我希望通过 FreeBSD 改善界面破坏时间。 Destroying thousands of interfaces takes several minutes on my test machine running -CURRENT , and while -- admittedly -- my use case may be an unusual one, I'd like to understand what's taking the system so long.在运行-CURRENT测试机器上销毁数千个接口需要几分钟时间，虽然 - 不可否认 - 我的用例可能是一个不寻常的用例，但我想了解是什么导致了系统如此之久。

From my initial observations, I was able to establish that most of the time is spent waiting somewhere inside if_detach_internal() .根据我最初的观察，我能够确定大部分时间都花在等待if_detach_internal()内部的某个地方。 So in an attempt to profile this function, I came up with the following DTrace script:因此，为了分析这个函数，我想出了以下 DTrace 脚本：

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option dynvarsize=256m

fbt:kernel:if_detach_internal:entry
{
        self->traceme = 1;
        t[probefunc] = timestamp;
}

fbt:kernel:if_detach_internal:return
{
        dt = timestamp - t[probefunc];
        @ft[probefunc] = sum(dt);
        t[probefunc] = 0;
        self->traceme = 0;
}

fbt:kernel::entry
/self->traceme/
{
        t[probefunc] = timestamp;
}

fbt:kernel::return
/self->traceme/
{
        dt = timestamp - t[probefunc];
        @ft[probefunc] = sum(dt);
        t[probefunc] = 0;
}

By hooking to the entry and return fbt probes, I'm expecting to get a list of function names and cumulative execution times for every function called by if_detach_internal() (no matter the stack depth), and filter out anything else.通过钩住entry并return fbt 探测器，我希望获得if_detach_internal()调用的每个函数的函数名称和累积执行时间列表（无论堆栈深度如何），并过滤掉其他任何内容。

What I'm getting, however, looks like this (destroying 250 interfaces):然而，我得到的是这样的（破坏 250 个接口）：

callout_when                                                   1676
  sched_load                                                     1779
  if_rele                                                        1801
[...]
  rt_unlinkrte                                            10296062843
  sched_switch                                            10408456866
  rt_checkdelroute                                        11562396547
  rn_walktree                                             12404143265
  rib_walk_del                                            12553013469
  if_detach_internal                                      24335505097
  uma_zfree_arg                                        25045046322788
  intr_event_schedule_thread                           58336370701120
  swi_sched                                            83355263713937
  spinlock_enter                                      116681093870088
[...]
  spinlock_exit                                      4492719328120735
  cpu_search_lowest                                 16750701670277714

Timing information for at least some of the functions seems to make sense, but I would expect if_detach_internal() to be the last entry in the list, with nothing taking longer than that, since this function is at the top of the call tree I'm trying to profile.至少一些函数的计时信息似乎是有道理的，但我希望if_detach_internal()是列表中的最后一个条目，没有什么比这更长的时间了，因为这个函数位于调用树的顶部我我正在尝试配置文件。

Clearly, it is not the case, as I'm also getting measurements for other functions ( uma_zfree_arg() , swi_sched() , etc...) with seemingly crazy execution times.显然，情况并非如此，因为我还获得了其他函数（ uma_zfree_arg() 、 swi_sched()等）的测量值，其执行时间似乎很疯狂。 These results completely annihilate my trust in everything else DTrace tells me here.这些结果完全摧毁了我对 DTrace 在这里告诉我的其他一切的信任。

What am I missing ?我错过了什么？ It this approach sound at all ?这种方法听起来好吗？

Answer 1

I'll prefix my comments with the fact that I've not used DTrace on FreeBSD, only on macOS/OS X. So there might be something platform-specific at play here that I'm not aware of.我将在我的评论前加上我没有在 FreeBSD 上使用 DTrace，只在 macOS/OS X 上使用过的事实。所以这里可能有一些我不知道的特定于平台的东西在起作用。 With that out of the way:顺便说一句：

I'm a little uneasy about your use of the global associative array t .我对您使用全局关联数组t感到有些不安。 You might want to make that thread-local ( self->t ), because as it stands, your code can produce junk results if if_detach_internal is called from multiple threads simultaneously.您可能希望将该线程设为本地（ self->t ），因为就目前而言，如果从多个线程同时调用if_detach_internal ，您的代码可能会产生垃圾结果。
Your use of the global dt variable is similarly dangerous and thread-unsafe.您对全局dt变量的使用同样危险且线程不安全。 This really should be this->dt everywhere (a clause-local variable).这真的应该是this->dt无处不在（子句局部变量）。
Another thing to be aware of, but which shouldn't cause problems in your code as it stands right now , is that the action fbt:kernel::entry /self->traceme/ will be invoked for if_detach_internal itself .另一件要注意的事情，但它不应该在你的代码中造成问题，因为它现在是，操作fbt:kernel::entry /self->traceme/将被调用if_detach_internal本身。 This is because the latter function of course matches the wildcard, and actions are executed in the order in which they appear in the script, so by the time the predicate on the wildcard entry action is checked, the non-wildcard action will have set self->traceme = 1;这是因为后面的函数当然匹配通配符，并且动作按照它们在脚本中出现的顺序执行，所以当检查通配符entry动作的谓词时，非通配符动作将设置self->traceme = 1; Double-setting the timestamp like this should cause no ill effects, but judging by the way the code is written, you may have been unaware that this is in fact what it does, which could cause problems if you make further changes down the line.像这样重复设置时间戳应该不会造成不良影响，但是从编写代码的方式来看，您可能没有意识到这实际上是它的作用，如果您进一步更改可能会导致问题。

Unfortunately, DTrace scoping rules are rather unintuitive, in that everything is global and thread-unsafe by default.不幸的是，DTrace 范围规则相当不直观，因为默认情况下一切都是全局的且线程不安全的。 And yes, this still bites me every now and then, even after having written a fair amount of DTrace script code.是的，即使在编写了大量的 DTrace 脚本代码之后，这仍然时不时地困扰着我。

I don't know if following the above advice will fix your problem entirely;我不知道遵循上述建议是否可以完全解决您的问题； if not, please update your question accordingly and drop me a comment below and I'll take another look.如果没有，请相应地更新您的问题，并在下面给我留言，我再看看。

Answer 2

This is another variation of a really simple but extremely useful dTrace script that I've often used to find out where any kernel is actually spending most of its time:这是一个非常简单但非常有用的 dTrace 脚本的另一个变体，我经常使用它来找出任何内核实际上花费大部分时间的地方：

#!/usr/sbin/dtrace -s

profile:::profile-1001hz
/arg0/
{
    @[ stack() ] = count();
}

That profiles the kernel's stack traces, and when the script exits via CTRL-C or some other method it will print something like this:这会分析内核的堆栈跟踪，当脚本通过CTRL-C或其他一些方法退出时，它将打印如下内容：

             .
             .
             .
          unix`z_compress_level+0x9a
          zfs`zfs_gzip_compress+0x4e
          zfs`zfs_compress_data+0x8c
          zfs`zio_compress+0x9f
          zfs`zio_write_bp_init+0x2b4
          zfs`zio_execute+0xc2
          genunix`taskq_thread+0x3ad
          unix`thread_start+0x8
          703

          unix`deflate_slow+0x8a
          unix`z_deflate+0x75a
          unix`z_compress_level+0x9a
          zfs`zfs_gzip_compress+0x4e
          zfs`zfs_compress_data+0x8c
          zfs`zio_compress+0x9f
          zfs`zio_write_bp_init+0x2b4
          zfs`zio_execute+0xc2
          genunix`taskq_thread+0x3ad
          unix`thread_start+0x8
         1708

          unix`i86_mwait+0xd
          unix`cpu_idle_mwait+0x1f3
          unix`idle+0x111
          unix`thread_start+0x8
        86200

That's an example set of stack traces and the number of times that stack trace was sampled.这是一组示例堆栈跟踪以及对堆栈跟踪进行采样的次数。 Note that it prints the most frequent stack traces last.请注意，它最后打印最频繁的堆栈跟踪。

So you can immediately see the stack trace(s) most frequently sampled - which is going to be where the kernel is spending a lot of time.因此，您可以立即看到最常采样的堆栈跟踪 - 这将是内核花费大量时间的地方。

Note also that the stack traces are printed in what you may think is reverse order - the outer, topmost call is printed last.另请注意，堆栈跟踪以您可能认为是相反的顺序打印 - 最后打印最外层的调用。

使用 DTrace 分析 FreeBSD 内核

问题描述

2 个解决方案

解决方案1
1 2020-10-05 13:37:35

解决方案2
1 2020-10-27 20:23:15

使用 DTrace 分析 FreeBSD 内核

问题描述

2 个解决方案

解决方案1 1 2020-10-05 13:37:35

解决方案2 1 2020-10-27 20:23:15

解决方案1
1 2020-10-05 13:37:35

解决方案2
1 2020-10-27 20:23:15