简体   繁体   English

C ++ STL:Array vs Vector:原始元素访问性能

[英]C++ STL: Array vs Vector: Raw element accessing performance

I'm building an interpreter and as I'm aiming for raw speed this time, every clock cycle matters for me in this (raw) case. 我正在建立一个翻译,因为这次我的目标是原始速度,所以在这个(原始)情况下,每个时钟周期对我都很重要。

Do you have any experience or information what of the both is faster: Vector or Array? 您是否有任何经验或信息两者更快:Vector或Array? All what matters is the speed I can access an element (opcode receiving), I don't care about inserting, allocation, sorting, etc. 重要的是我可以访问元素的速度(操作码接收),我不关心插入,分配,排序等。

I'm going to lean myself out of the window now and say: 我现在要把自己从窗户里拉出来说:

  • Arrays are at least a bit faster than vectors in terms of accessing an element i. 在访问元素i方面,数组至少比向量快一点。

It seems really logical for me. 这对我来说似乎很合乎逻辑。 With vectors you have all those security and controlling overhead which doesn't exist for arrays. 使用向量,您可以获得阵列不存在的所有安全性和控制开销。

(Why) Am I wrong? (为什么)我错了?

No, I can't ignore the performance difference - even if it is so small - I have already optimized and minimized every other part of the VM which executes the opcodes :) 不,我不能忽视性能差异 - 即使它是如此之小 - 我已经优化并最小化执行操作码的VM的每个其他部分:)

Element access time in a typical implementation of a std::vector is the same as element access time in an ordinary array available through a pointer object (ie a run-time pointer value) std::vector的典型实现中的元素访问时间与通过指针对象可用的普通数组中的元素访问时间相同(即运行时指针值)

std::vector<int> v;
int *pa;
...
v[i];
pa[i]; 
// Both have the same access time

However, the access time to an element of an array available as an array object is better than both of the above accesses (equivalent to access through a compile-time pointer value) 但是,作为数组对象的数组元素的访问时间优于上述两种访问(相当于通过编译时指针值进行访问)

int a[100];
...
a[i];
// Faster than both of the above

For example, a typical read access to an int array available through a run-time pointer value will look as follows in the compiled code on x86 platform 例如,通过运行时指针值提供的对int数组的典型读访问将在x86平台上的编译代码中如下所示

// pa[i]
mov ecx, pa // read pointer value from memory
mov eax, i
mov <result>, dword ptr [ecx + eax * 4]

Access to vector element will look pretty much the same. 对vector元素的访问看起来几乎相同。

A typical access to a local int array available as an array object will look as follows 对可用作数组对象的本地int数组的典型访问将如下所示

// a[i]
mov eax, i
mov <result>, dword ptr [esp + <offset constant> + eax * 4]

A typical access to a global int array available as an array object will look as follows 对作为数组对象提供的全局int数组的典型访问将如下所示

// a[i]
mov eax, i
mov <result>, dword ptr [<absolute address constant> + eax * 4]

The difference in perfromance arises from that extra mov instruction in the first variant, which has to make an extra memory access. 性能的差异源于第一个变体中的额外mov指令,它必须进行额外的存储器访问。

However, the difference is negligible. 但是,差异可以忽略不计。 And it is easily optimized to the point of being exactly the same in multiple-access context (by loading the target address in a register). 并且它很容易被优化到在多访问上下文中完全相同的程度(通过在寄存器中加载目标地址)。

So the statement about "arrays being a bit faster" is correct in narrow case when the array is accessible directly through the array object, not through a pointer object. 因此,当数组可以直接通过数组对象访问而不是通过指针对象时,关于“数组快一点”的语句是正确的。 But the practical value of that difference is virtually nothing. 但这种差异的实际价值几乎没有。

You may be barking up the wrong tree. 你可能正在咆哮错误的树。 Cache misses can be much more important than the number of instructions that get executed. 缓存未命中可能比执行的指令数量重要得多。

No. Under the hood, both std::vector and C++0x std::array find the pointer to element n by adding n to the pointer to the first element. 在引擎盖下, std::vector和C ++ 0x std::array通过向指向第一个元素的指针添加n来查找指向元素n的指针。

vector::at may be slower than array::at because the former must compare against a variable while the latter compares against a constant. vector::at可能比array::atarray::at因为前者必须与变量进行比较,而后者与常量进行比较。 Those are the functions that provide bounds checking, not operator[] . 这些是提供边界检查的函数,而不是operator[]

If you mean C-style arrays instead of C++0x std::array , then there is no at member, but the point remains. 如果你的意思是C风格的数组 ,而不是的C ++ 0x std::array ,那么有没有at成员,但该点保持。

EDIT: If you have an opcode table, a global array (such as with extern or static linkage) may be faster. 编辑:如果你有一个操作码表,全局数组(如externstatic链接)可能会更快。 Elements of a global array would be addressable individually as global variables when a constant is put inside the brackets, and opcodes are often constants. 当常量放在括号内时,全局数组的元素可以作为全局变量单独寻址,而操作码通常是常量。

Anyway, this is all premature optimization. 无论如何,这都是过早的优化。 If you don't use any of vector 's resizing features, it looks enough like an array that you should be able to easily convert between the two. 如果你不使用任何vector的大小调整功能,它看起来就像一个数组,你应该能够轻松地在两者之间进行转换。

You're comparing apples to oranges. 你将苹果与橙子进行比较。 Arrays have a constant-size and are automatically allocated, while vectors have a dynamic size and are dynamically allocated. 数组具有常量大小并自动分配,而向量具有动态大小并且是动态分配的。 Which you use depends on what you need. 你使用哪个取决于你需要什么。

Generally, arrays are "faster" to allocate (in quotes because comparison is meaningless) because dynamic allocation is slower. 通常,数组“更快”分配(在引号中因为比较没有意义),因为动态分配较慢。 However, accessing an element should be the same. 但是,访问元素应该是相同的。 (Granted an array is probably more likely to be in cache, though that doesn't matter after the first access.) (当然,数组可能更有可能在缓存中,但在第一次访问后无关紧要。)

Also, I don't know what "security" you're talking about, vector 's have plenty of ways to get undefined behavior just like arrays. 另外,我不知道你在说什么“安全性”, vector有很多方法可以像数组一样获得未定义的行为。 Though they have at() , which you don't need to use if you know the index is valid. 虽然它们有at() ,如果你知道索引是有效的,你不需要使用它。

Lastly, profile and look at the generated assembly. 最后,剖析并查看生成的程序集。 Nobody's guess is gonna solve anything. 没有人猜测会解决任何问题。

For decent results, use std::vector as the backing storage and take a pointer to its first element before your main loop or whatever: 为了获得不错的结果,使用std::vector作为后备存储,并在主循环或其他之前获取指向其第一个元素的指针:

std::vector<T> mem_buf;
// stuff
uint8_t *mem=&mem_buf[0];
for(;;) {
    switch(mem[pc]) {
    // stuff
    }
}

This avoids any issues with over-helpful implementations that perform bounds checking in operator[] , and makes single-stepping easier when stepping into expressions such as mem_buf[pc] later in the code. 这避免了在operator[]中执行边界检查的过度有用的实现的任何问题,并且当在代码中稍后插入诸如mem_buf[pc]表达式时使单步执行变得更容易。

If each instruction does enough work, and the code is varied enough, this should be quicker than using a global array by some negligible amount. 如果每条指令都做了足够的工作,并且代码变化足够大,那么这比使用全局数组的速度要快一些。 (If the difference is noticeable, the opcodes need to be made more complicated.) (如果差异很明显,则需要使操作码变得更复杂。)

Compared to using a global array, on x86 the instructions for this sort of dispatch should be more concise (no 32-bit displacement fields anywhere), and for more RISC-like targets there should be fewer instructions generated (no TOC lookups or awkward 32-bit constants), as the commonly-used values are all in the stack frame. 与使用全局数组相比,在x86上,这种调度的指令应该更简洁(任何地方都没有32位位移字段),对于更多类似RISC的目标,应该生成更少的指令(没有TOC查找或笨拙32 -bit常数),因为常用值都在堆栈帧中。

I'm not really convinced that optimizing an interpreter's dispatch loop in this way will produce a good return on time invested -- the instructions should really be made to do more, if it's an issue -- but I suppose it shouldn't take long to try out a few different approaches and measure the difference. 我并不是真的相信以这种方式优化解释器的调度循环会产生良好的投入回报 - 如果这是一个问题,那么指令应该真的做得更多 - 但我认为不应该花很长时间尝试一些不同的方法并衡量差异。 As always in the event of unexpected behaviour the generated assembly language (and, on x86, the machine code, as instruction length can be a factor) should be consulted to check for obvious inefficiencies. 一如往常出现意外行为,应查询生成的汇编语言(以及x86上的机器代码,指令长度可能是一个因素),以检查明显的低效率。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM