使用 -O3 编译 gcc 在技术上会增加我的缓存未命中率

Question

I've been profiling a bit with cachegrind and noticed something.我一直在使用 cachegrind 进行分析并注意到一些事情。 When i compile with -O3 i had fewer data fetches but the same amount of cache misses resulting in a higher miss rate.当我用 -O3 编译时，我有较少的数据获取，但相同数量的缓存未命中导致更高的未命中率。 This is great but it just seems like a funny weird thing to me and i'd like to know what's going on behind the scene.这很棒，但对我来说似乎是一件有趣的奇怪的事情，我想知道幕后发生了什么。 The only other relevant compiler option i have turned on is -march=native .我打开的唯一其他相关编译器选项是 -march=native 。 For comparison,为了比较，

Without O3不含 O3

==16951== D   refs:        923,170,681  (817,941,424 rd   + 105,229,257 wr)
==16951== D1  misses:        9,477,102  (  8,115,150 rd   +   1,361,952 wr)
==16951== LLd misses:          647,219  (    262,227 rd   +     384,992 wr)
==16951== D1  miss rate:           1.0% (        1.0%     +         1.3%  )
==16951== LLd miss rate:           0.1% (        0.0%     +         0.4%  )

With O3含 O3

==16978== D   refs:      218,804,125  (205,979,405 rd   + 12,824,720 wr)
==16978== D1  misses:      9,372,533  (  8,016,083 rd   +  1,356,450 wr)
==16978== LLd misses:        647,195  (    262,191 rd   +    385,004 wr)
==16978== D1  miss rate:         4.3% (        3.9%     +       10.6%  )
==16978== LLd miss rate:         0.3% (        0.1%     +        3.0%  )

Answer 1

It's most likely due to vectorization:这很可能是由于矢量化：

-O3

Optimize yet more. -O3 turns on all optimizations specified by -O2
and also turns on ... -ftree-vectorize and -fipa-cp-clone options.

(from GCC manpage). （来自 GCC 联机帮助页）。

使用 -O3 编译 gcc 在技术上会增加我的缓存未命中率

问题描述

1 个解决方案

解决方案1
1 2019-12-14 17:10:35

使用 -O3 编译 gcc 在技术上会增加我的缓存未命中率

问题描述

1 个解决方案

解决方案1 1 2019-12-14 17:10:35

解决方案1
1 2019-12-14 17:10:35