[英]Is comparison of uint8_t values in 64-bit Intel architecture slower than comparison of uint32_t?
Is comparison of uint8_t
values in 64-bit Intel architecture slower than comparison of uint32_t
? 64 位英特尔架构中uint8_t
值的比较是否比uint32_t
比较慢? In particular, in Visual Studio C++.特别是在 Visual Studio C++ 中。
On my machine it's too close to call:在我的机器上调用太近了:
#include <cstdint>
#include <vector>
#include <random>
#include <iostream>
#include <chrono>
#include <algorithm>
#include <numeric>
template<class T>
std::vector<T> generate_test_data(std::size_t seed)
{
auto v = std::vector<T>(20000000);
std::default_random_engine eng(seed);
std::uniform_int_distribution<T> dist(-127, 127);
std::generate(std::begin(v), std::end(v),
[&eng, &dist] {
return dist(eng);
});
return v;
}
auto inc_if_under_zero = [](int count, auto val) {
return (val < 0) ? count + 1 : count;
};
int main()
{
std::random_device rd;
auto seed = rd();
auto int_data = generate_test_data<std::int32_t>(seed);
auto byte_data = generate_test_data<std::int8_t>(seed);
auto t0 = std::chrono::high_resolution_clock::now();
auto less_zero_32 = std::accumulate(std::begin(int_data),
std::end(int_data),
0,
inc_if_under_zero);
auto t1 = std::chrono::high_resolution_clock::now();
auto less_zero_8 = std::accumulate(std::begin(byte_data),
std::end(byte_data),
0,
inc_if_under_zero);
auto t2 = std::chrono::high_resolution_clock::now();
auto int_time = std::chrono::duration_cast<std::chrono::microseconds>(t1-t0).count();
auto byte_time = std::chrono::duration_cast<std::chrono::microseconds>(t2-t1).count();
std::cout << "totals : " << less_zero_32 << ", " << less_zero_8 << std::endl;
std::cout << "int time : " << int_time << "us" << std::endl;
std::cout << "byte time : " << byte_time << "us" << std::endl;
}
results with apple clang, -O3:结果与苹果叮当声,-O3:
totals : 9962877, 9962877
int time : 6644us
byte time : 6035us
maybe the byte time is less because we have to traverse less memory.也许字节时间更少,因为我们必须遍历更少的内存。
There is only one way to find out - write a test program and find out.只有一种方法可以找出答案——编写一个测试程序并找出答案。
The x64 architecture instruction set is effectively a complex language which is interpreted at run time into a series of much simpler instructions. x64 架构指令集实际上是一种复杂的语言,它在运行时被解释为一系列简单得多的指令。 The x64 instruction set is very stable, but those simpler instructions differ from chip to chip. x64 指令集非常稳定,但那些更简单的指令因芯片而异。 It is thus not possible to say with any certainty whether the result is slower or not.因此,无法确定结果是否更慢。
In practise, I would be very surprised is there was a measurably difference.在实践中,我会感到非常惊讶的是存在可测量的差异。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.