简体   繁体   English

缓存未命中的价格是多少

[英]What is the price of a cache miss

I'm analyzing some code and using cachegrind to get the number of cachemisses(L2 and L3) in the execution. 我正在分析一些代码并使用cachegrind来获取执行中的cachemisses(L2和L3)的数量。

My question is how do I determine the time spend waiting for the cache to get readdy based on the cache misses? 我的问题是如何根据缓存未命中确定等待缓存获取readdy的时间?

I would like to be able to say something like, "my code get 90% cpu utilization" 我希望能够说出“我的代码获得90%的cpu利用率”之类的话题

is it posible to do this based on the cache grind output? 是否可以根据缓存研磨输出执行此操作?

Cachegrind simply simulates execution on a CPU, emulating how the cache and branch predictor might behave. Cachegrind简单地模拟CPU上的执行,模拟缓存和分支预测器的行为方式。 To be able to know how long you would spend blocking on the cache would require a lot more information. 为了能够知道在缓存上花费多长时间,需要更多信息。 Specifically you need to know when execution can be speculated and how many instructions can be dispatched in parallel (as well as how memory memory accesses can be coordinated simultaneously). 具体而言,您需要知道何时可以推测执行以及可以并行分派多少指令(以及如何同时协调内存访问)。 Cachegrind can't do this, and any tool that could would depend heavily on the processor (whereas cache misses are much less processor dependent). Cachegrind无法做到这一点,任何可能在很大程度上依赖于处理器的工具(而缓存未命中的处理器依赖性要小得多)。

If you have access to a modern Intel CPU I'd recommend getting a free copy of VTune (for non-commercial purposes) and seeing what it says. 如果您可以访问现代英特尔CPU,我建议您免费获得VTune(用于非商业目的)并查看其内容。 It can tell the processor to collect data on cache misses and will report it back to you, so you can see what actually happened rather then just simulating. 它可以告诉处理器收集有关缓存未命中的数据并将其报告给您,这样您就可以看到实际发生的事情,而不仅仅是模拟。 It will give you a clocks per instruction for each line of code, and using this you can see which lines are blocking on the cache (and how long for), it can also give you all the other information cachegrind can. 它将为每行代码提供每个指令的时钟,使用它可以看到哪些行在缓存上阻塞(以及多长时间),它还可以为您提供cachegrind可以提供的所有其他信息。

You can get it here: 你可以在这里得到它:

http://software.intel.com/en-us/articles/non-commercial-software-download/ http://software.intel.com/en-us/articles/non-commercial-software-download/

唯一可以确定的方法是使用CPU的性能监视计数器来测量特定的CPU - 即使这样,结果也非常具体,基于此的任何优化都可能对具有不同高速缓存大小的CPU表现得非常糟糕,总线架构或内存配置。

A variable can be fetched from the cache in a few clock cycles. 可以在几个时钟周期内从高速缓存中获取变量。

It can take more than one hundred clock cycles to fetch it from RAM if it isnt in the cache. 如果它不在缓存中,则可能需要超过一百个时钟周期才能从RAM中获取它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM