简体   繁体   English

缓存行大小

[英]Cache line size

It might be a very common and simple question but I need some explanation about the curve that I just obtained from a cache benchmarks code. 这可能是一个非常常见和简单的问题,但是我需要对我刚刚从缓存基准测试代码中获得的曲线进行一些解释。 The goal here is to find the cache line size. 这里的目标是找到缓存行大小。 I used the code from here: (h**ps://github.com/jiewmeng/cs3210-assign1/blob/master/cache-l1-line.cpp) 我从这里使用了代码:(h ** ps://github.com/jiewmeng/cs3210-assign1/blob/master/cache-l1-line.cpp)

This is the curve that I have obtained from running the code on my machine (Macbook Pro with core i7 - cache line size is 64byte - L1 data cache is 32KB). 这是我在机器上运行代码所获得的曲线(具有i7核心的Macbook Pro-缓存行大小为64byte-L1数据缓存为32KB)。

The Time vs different stride size curve 时间与步幅大小曲线

  • I think the peak happens on 128 bytes and not on the 64 bytes. 我认为峰值发生在128字节而不是64字节上。 if it is true I want to know why? 如果是真的,我想知道为什么吗?
  • Why the time is reduced at 512 bytes? 为什么时间减少到512字节?

Update: 更新:

I also ran a code to determine the size of the L1 and L2 caches. 我还运行了一个代码来确定L1和L2缓存的大小。 Here is the figure just to document the data. 这是用于记录数据的图。 As you can see there is two peak in 32KB (L1 Cache size) and 256KB (L2 Cache size). 如您所见,在32KB(L1缓存大小)和256KB(L2缓存大小)中有两个峰值。

Question: 题:

I am wondering if there is any way to find the size of L3 shared cache. 我想知道是否有任何方法可以找到L3共享缓存的大小。

Cache size figure . 缓存大小图

Thanks 谢谢

I'm guessing that the 128B peak is most likely due to spatial prefetching. 我猜想128B峰值很可能是由于空间预取引起的。 You can see in Intels' Optimization guide , under section 2.1.5.4 您可以在《英特尔的优化指南 》第2.1.5.4节中看到

This prefetcher strives to complete every cache line fetched to the L2 cache with the pair line that completes it to a 128-byte aligned chunk 该预取器力求用成对的行来完成提取到L2高速缓存的每条高速缓存行,从而将其完成为128字节对齐的块

It wouldn't be a clean jump since this prefetches is not always firing, and even when it does, it only prefetches into the L2, but it's much better than fetching from memory. 这不是一个干净的跳转,因为这种预取并不总是会触发,即使是这样,它也只会预取到L2中,但比从内存中获取要好得多。 To make sure this is the case, you can disable prefetches (through BIOS or other means, although some systems may not support that), and check again. 为确保确实如此,您可以禁用预取(通过BIOS或其他方式,尽管某些系统可能不支持该功能),然后再次检查。

As for the L3 size - you didn't specify your exact model, but i'm guessing you have more than 4M L3 - just keep the curve going and see if it jumps. 至于L3的大小-您没有指定确切的型号,但我想您有超过4M的L3-只需保持曲线前进,看看它是否跳跃即可。

EDIT 编辑

Just noticed another thing - your k*i expression is probably overflowing int at the max range, which means your access pattern might not be cyclic as you expect. 刚刚注意到另一件事-您的k * i表达式可能在最大范围内溢出int,这意味着您的访问模式可能不会像您期望的那样循环。

My BusSpeed benchmark was intended to identify cache sizes and performance at different strides, to show burst reading on buses: 我的BusSpeed基准旨在确定不同步幅下的缓存大小和性能,以显示总线上的突发读取:

http://www.roylongbottom.org.uk/busspd2k%20results.htm http://www.roylongbottom.org.uk/busspd2k%20results.htm

Following are results on a Core i7 with 8 MB L3: 以下是具有8 MB L3的Core i7的结果:

  Memory  Reg2  Reg2  Reg2  Reg2  Reg1  Reg2  Reg1  Reg2  Reg1  Reg8
  KBytes Inc64 Inc32 Inc16  Inc8  Inc4  Inc4  Inc4  Inc4  Inc8  Inc8
   Used   MB/S  MB/S  MB/S  MB/S  MB/S  MB/S  MB/S  MB/S  MB/S  MB/S

      4  10025 10800 11262 11498 11612 11634  5850 11635 23093 23090
      8  10807 11267 11505 11627 11694 11694  5871 11694 23299 23297
     16  11251 11488 11620 11614 11712 11719  5873 11718 23391 23398
     32   9893  9853 10890 11170 11558 11492  5872 11466 21032 21025
     64   3219  4620  7289  9479 10805 10805  5875 10797 14426 14426
    128   3213  4805  7305  9467 10811 10810  5875 10805 14442 14408
    256   3144  4592  7231  9445 10759 10733  5870 10743 14336 14337
    512   2005  3497  5980  9056 10466 10467  5871 10441 13906 13905
   1024   2003  3482  5974  9017 10468 10466  5874 10467 13896 13818
   2048   2004  3497  5958  9088 10447 10448  5870 10447 13857 13857
   4096   1963  3398  5778  8870 10328 10328  5851 10328 13591 13630
   8192   1729  3045  5322  8270  9977  9963  5728  9965 12923 12892
  16384    692  1402  2495  4593  7811  7782  5406  7848  8335  8337
  32768    695  1406  2492  4584  7820  7826  5401  7792  8317  8322
  65536    695  1414  2488  4584  7823  7826  5403  7800  8321  8321
 131072    696  1402  2491  4575  7827  7824  5411  7846  8322  8323
 262144    696  1413  2498  4594  7791  7826  5409  7829  8333  8334
 524288    693  1416  2498  4595  7841  7842  5411  7847  8319  8285
1048576    704  1415  2478  4591  7845  7840  5410  7853  8290  8283

                  End of test Fri Jul 30 16:44:29 2010

  CPUID and RDTSC Assembly Code
  CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000106A5
  Intel(R) Core(TM) i7 CPU         930  @ 2.80GHz Measured 2807 MHz

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM