简体   繁体   中英

Is comparison of uint8_t values in 64-bit Intel architecture slower than comparison of uint32_t?

Is comparison of uint8_t values in 64-bit Intel architecture slower than comparison of uint32_t ? In particular, in Visual Studio C++.

On my machine it's too close to call:

#include <cstdint>
#include <vector>
#include <random>
#include <iostream>
#include <chrono>
#include <algorithm>
#include <numeric>

template<class T>
std::vector<T> generate_test_data(std::size_t seed)
{
    auto v = std::vector<T>(20000000);
    std::default_random_engine eng(seed);
    std::uniform_int_distribution<T> dist(-127, 127);
    std::generate(std::begin(v), std::end(v),
                  [&eng, &dist] {
                      return dist(eng);
                  });
    return v;
}

auto inc_if_under_zero = [](int count, auto val) {
    return (val < 0) ? count + 1 : count;
};

int main()
{
    std::random_device rd;
    auto seed = rd();
    auto int_data = generate_test_data<std::int32_t>(seed);
    auto byte_data = generate_test_data<std::int8_t>(seed);

    auto t0 = std::chrono::high_resolution_clock::now();

    auto less_zero_32 = std::accumulate(std::begin(int_data),
                                        std::end(int_data),
                                        0,
                                        inc_if_under_zero);
    auto t1 = std::chrono::high_resolution_clock::now();

    auto less_zero_8 = std::accumulate(std::begin(byte_data),
                                       std::end(byte_data),
                                       0,
                                       inc_if_under_zero);

    auto t2 = std::chrono::high_resolution_clock::now();

    auto int_time = std::chrono::duration_cast<std::chrono::microseconds>(t1-t0).count();
    auto byte_time = std::chrono::duration_cast<std::chrono::microseconds>(t2-t1).count();

    std::cout << "totals    : " << less_zero_32 << ", " << less_zero_8 << std::endl;
    std::cout << "int time  : " << int_time << "us" << std::endl;
    std::cout << "byte time : " << byte_time << "us" << std::endl;
}

results with apple clang, -O3:

totals    : 9962877, 9962877
int time  : 6644us
byte time : 6035us

maybe the byte time is less because we have to traverse less memory.

There is only one way to find out - write a test program and find out.

The x64 architecture instruction set is effectively a complex language which is interpreted at run time into a series of much simpler instructions. The x64 instruction set is very stable, but those simpler instructions differ from chip to chip. It is thus not possible to say with any certainty whether the result is slower or not.

In practise, I would be very surprised is there was a measurably difference.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM