简体   繁体   English

迭代速度和元素大小

[英]Iteration speed and element size

I have a std::vector filled with following structures: 我有一个std::vector填充了以下结构:

#define ELEMENTSIZE 8

struct Element {
int value;
char size[ELEMENTSIZE - 4]; //1 char is 1B in size - 4B int
}; 

The size of structure depends on defined elementsize, which makes an array of chars of specified size in the structure. 结构的大小取决于已定义的元素大小,这会在结构中生成指定大小的字符数组。

I am benchmarking an average value of these structures in vector and I would love to know the reason why vector filled with bigger structures in size takes longer to iterate over. 我在向量中对这些结构的平均值进行基准测试,我很想知道填充大尺寸结构的向量需要更长时间迭代的原因。

For example vector with 1 000 000 8B structures takes roughly 1,7ms and the same test with 128B structures 12,7ms. 例如,具有1 000 000个8B结构的矢量大约花费1,7ms,并且相同的测试具有128B结构12,7ms。

Is that big difference because of cache only? 这是因为缓存只有很大的区别吗? If so, could you explain why? 如果是这样,你能解释一下原因吗? Or is there any other aspect that I can not see? 或者还有其他方面我看不到?

The structure is 16 times bigger, so it should take 16 times longer to iterate through. 结构大16倍,因此迭代需要花费16倍的时间。 Mathematically 12,7/1,7 = 7,47 times more, so it almost matches up mathematically. 数学上12,7 / 1,7 = 7,47倍,因此几乎在数学上匹配。

Now imagine the structure containing the 128B elements was a structure containing 8B elements, but the same size. 现在想象一下,包含128B元素的结构是一个包含8B元素的结构,但大小相同。 Do you see now that it really is 16 times larger? 你现在看到它真的大16倍吗?

The OS must bring the larger structures in memory, which may take this path: 操作系统必须将更大的结构带入内存,这可能需要这条路径:

  • From virtual memory to main-memory (L4) 从虚拟内存到主内存(L4)
  • From main-memory to L3, and to L2 and L1 till the processor (if needs processing) 从主存储器到L3,再到L2和L1直到处理器(如果需要处理)
  • At the L1 or processor level, contents have to be copied around the iterator object being used. 在L1或处理器级别,必须在正在使用的iterator对象周围复制内容。 It largely depends on cache performance. 它主要取决于缓存性能。
  • Now at each iteration, it depends what operation you are performing with the iterator. 现在在每次迭代中,它取决于您使用迭代器执行的操作。 If content is being copied, or displayed on screen, some sorting/compression being performed more time (from CPU to L4 if needed) would be required. 如果正在复制或显示内容,则需要执行一些排序/压缩更多时间(如果需要,从CPU到L4)。

If all of this is happening, why would 128 structure not take more time than a 8 byte structure? 如果所有这一切都发生了,为什么128结构不会花费比8字节结构更多的时间?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM