如何分析程序运行时间

Question

I am trying to optimize a c++ program's performance and reduce its run time. 我正在尝试优化c ++程序的性能并减少其运行时间。 However, I am having trouble figuring out where is the bottleneck. 但是，我无法弄清楚瓶颈在哪里。

time command shows that the program itself takes about 5 minutes to run, and about the 5 minutes, user cpu time takes 4.5 minutes. time命令显示程序本身需要大约5分钟才能运行，大约5分钟，用户cpu时间需要4.5分钟。

CPU profiler (both gcc profiler and google perftool) shows that the function calls only take 60 seconds in total in CPU time. CPU分析器（gcc profiler和google perftool）显示函数调用在CPU时间中总共需要60秒。 I also tried to use the profiler to sample real time instead of cpu time, and it gives me similar results. 我还尝试使用分析器来实时采样而不是cpu时间，它给了我类似的结果。

I/O profiler (I used ioapps) also shows that I/O only takes about 30 seconds of the program running time. I / O探查器（我使用过ioapps）也表明I / O只需要大约30秒的程序运行时间。

So basically I have 3.5 minutes (the largest bulk of the program running time) unaccounted for, and I believe that is where the bottleneck is. 所以基本上我有3.5分钟（程序运行时间的最大部分）下落不明，我相信这就是瓶颈所在。

What did I miss and how do I get to know where that time goes? 我错过了什么，如何知道那个时间到了哪里？

Answer 1

As Öö Tiib suggested, just break the program in a debugger. 正如ÖöTiib建议的那样，只需在调试器中打破程序即可。 The way I do it is get the program running, switch to the output window, type Ctrl-C to interrupt the program, switch back to the GDB window, type "thread 1" so as to be in the context of the main program, and type "bt" to see the stack trace. 我这样做是让程序运行，切换到输出窗口，键入Ctrl-C来中断程序，切换回GDB窗口，输入“thread 1”以便在主程序的上下文中，并输入“bt”以查看堆栈跟踪。

Now, look at the stack trace and understand it, because while the instruction at the program counter is responsible for that particular cycle being spent, so is every call on the stack . 现在，查看堆栈跟踪并理解它，因为虽然程序计数器上的指令负责处理该特定循环， 但堆栈上的每个调用也是如此 。

If you do this a few times, you're going to see exactly what line is responsible for the bottleneck. 如果你这样做了几次，你就会看到确切的线路是造成瓶颈的原因。 As soon as you see it on two (2) samples, you've nailed it. 一旦你在两（2）个样品上看到它，你就已经钉了它。 Then fix it and do it all again, finding the next bottleneck, and so on. 然后修复它并再次完成所有操作，找到下一个瓶颈，依此类推。 You could easily find that you get enormous speedup this way. 你可以很容易地发现你通过这种方式获得了巨大的加速。

< flame> <火焰>

Some people say this is exactly what profilers do, only they do it better. 有人说这正是剖析器所做的，只有他们做得更好。 That's what you hear in lecture halls and on blogs, but here's the deal: There are ways to speed up your code that do not reveal themselves as "slow functions" or "hot paths", for example - reorganizing the data structure. 这就是你在演讲厅和博客上听到的内容，但是这里的交易是：有一些方法可以加速你的代码，而不是将自己显示为“慢速函数”或“热路径”，例如 - 重新组织数据结构。 Every function looks more-or-less innocent, even if it has high inclusive time percent. 即使它具有较高的包容时间百分比，每个功能看起来或多或少都是无辜的。

They do reveal themselves if you actually look at stack samples . 如果您真正查看堆栈样本，它们会显示自己。 So the problem with good profilers is not in the collection of samples, it is in the presentation of results . 因此，好的剖析器的问题不在于样本的收集， 而在于结果的呈现 。 Statistics and measurements cannot tell you what a small selection of samples, examined carefully, do tell you. 统计和测量不能告诉你什么样的小样本，仔细检查，告诉你。

What about the issue of small vs. large number of samples? 小样本和大量样本的问题怎么样？ Aren't more better? 不是更好吗？ OK, suppose you have an infinite loop, or if not infinite, it just runs far longer than you know it should? 好吧，假设你有一个无限循环，或者如果不是无限循环，它运行的时间比你知道的要长得多？ Would 1000 stack samples find it any better than a single sample? 1000个堆叠样本会发现它比单个样本更好吗？ (No.) If you look at it under a debugger, you know you're in the loop because it takes basically 100% of the time. （不）如果你在调试器下看它，你知道你在循环中，因为它基本上占用了100％的时间。 It's on the stack somewhere - just scan up the stack until you find it. 它位于堆栈的某个位置 - 只需扫描堆栈直到找到它。 Even if the loop only takes 50% or 20% of the time, that's the probability each sample will see it. 即使循环只占50％或20％的时间，这就是每个样本看到它的概率。 So, if you see something you could get rid of on as few as two samples, it's worth doing it. 所以，如果你看到一些你可以在两个样本上摆脱的东西，那就值得去做。 So, what do the 1000 samples buy you? 那么，1000个样品会给你带来什么？

Maybe one thinks: "So what if we miss a problem or two? Maybe it's good enough." 也许有人会这样想：“那么，如果我们错过一两个问题怎么办呢？也许它已经足够好了。” Well, is it? 好吧，是吗？ Suppose the code has three problems P taking 50% of the time, Q taking 25%, and R taking 12.5%. 假设代码有三个问题P占50％的时间，Q占25％，R占12.5％。 The good stuff is called A. This shows the speedup you get if you fix one of them, two of them, or all three of them. 好的东西叫做A.这显示了如果你修复其中一个，其中两个，或者全部三个，你得到的加速。

PRPQPQPAPQPAPRPQ original time with avoidable code P, Q, and R all mixed together
RQQAQARQ         fix P           - 2 x   speedup
PRPPPAPPAPRP     fix Q           - 1.3 x    "
PPQPQPAPQPAPPQ   fix R           - 1.14 x   "
RAAR             fix P and Q     - 4 x      "
QQAQAQ           fix P and R     - 2.7 x    "
PPPPAPPAPP       fix Q and R     - 1.6 x    "
AA               fix P, Q, and R - 8 x   speedup

Does this make it clear why the ones that "get away" really hurt? 这是否清楚地说明为什么那些“逃避”的人真的受伤？ The best you can do if you miss any is twice as slow. 如果你错过任何东西，你可以做的最好的是慢两倍。

They are easy to find if you examine samples. 如果您检查样品，很容易找到它们。 P is on half the samples. P是样品的一半。 If you fix P and do it again, Q is on half the samples. 如果您修复P并再次执行，则Q是一半样本。 Once you fix Q, R is on half the samples. 一旦你修复Q，R就是一半的样本。 Fix R and you've got your 8x speedup. 修复R，你的速度提高了8倍。 You don't have to stop there. 你不必停在那里。 You can keep going until you truly can't find anything to fix. 你可以坚持下去，直到你真的找不到任何东西来解决。

The more problems there are, the higher the potential speedup, but you can't afford to miss any. 问题越多，潜在的加速越高，但你不能错过任何一个。 The problem with profilers (even good ones) is that, by denying you the chance to see and study individual samples, they hide problems that you need to find. 分析器（甚至是好的分析器）的问题在于，通过拒绝您查看和研究单个样本的机会，它们隐藏了您需要找到的问题。 More on all that. 更多关于这一点。 For the statistically inclined, here's how it works. 从统计学角度来看，这是它的工作原理。

There are good profilers. 有好的剖析师。 The best are wall-time stack samplers that report inclusive percent at individual lines, letting you turn sampling on and off with a hot-key. 最好的是壁挂式堆栈采样器，它可以报告各行的包含百分比，让您可以使用热键打开和关闭采样。 Zoom ( wiki ) is such a profiler. Zoom （ wiki ）就是这样一个分析器。

But even those make the mistake of assuming you need lots of samples. 但即便是那些人也会错误地认为你需要大量的样品。 You don't, and the price you pay for them is you can't actually see any, so you can't see why the time is being spent, so you can't easily tell if it's necessary, and you can't get rid of something unless you know you don't need it. 你没有，你为他们支付的价格是你实际上看不到的，所以你不明白为什么花时间，所以你不能轻易分辨是否有必要，你不能除非你知道你不需要它，否则摆脱一些东西。 The result is you miss bottlenecks, and they end up stunting your speedup. 结果是你错过了瓶颈，他们最终阻碍了你的加速。

< /flame> </ flame>

如何分析程序运行时间

问题描述

1 个解决方案

解决方案1
7 已采纳 2013-08-13 19:32:40

如何分析程序运行时间

问题描述

1 个解决方案

解决方案1 7 已采纳 2013-08-13 19:32:40

解决方案1
7 已采纳 2013-08-13 19:32:40