Is comparison of uint8_t values in 64-bit Intel architecture slower than comparison of uint32_t?

Question

Is comparison of uint8_t values in 64-bit Intel architecture slower than comparison of uint32_t ? In particular, in Visual Studio C++.

Answer 1

On my machine it's too close to call:

#include <cstdint>
#include <vector>
#include <random>
#include <iostream>
#include <chrono>
#include <algorithm>
#include <numeric>

template<class T>
std::vector<T> generate_test_data(std::size_t seed)
{
    auto v = std::vector<T>(20000000);
    std::default_random_engine eng(seed);
    std::uniform_int_distribution<T> dist(-127, 127);
    std::generate(std::begin(v), std::end(v),
                  [&eng, &dist] {
                      return dist(eng);
                  });
    return v;
}

auto inc_if_under_zero = [](int count, auto val) {
    return (val < 0) ? count + 1 : count;
};

int main()
{
    std::random_device rd;
    auto seed = rd();
    auto int_data = generate_test_data<std::int32_t>(seed);
    auto byte_data = generate_test_data<std::int8_t>(seed);

    auto t0 = std::chrono::high_resolution_clock::now();

    auto less_zero_32 = std::accumulate(std::begin(int_data),
                                        std::end(int_data),
                                        0,
                                        inc_if_under_zero);
    auto t1 = std::chrono::high_resolution_clock::now();

    auto less_zero_8 = std::accumulate(std::begin(byte_data),
                                       std::end(byte_data),
                                       0,
                                       inc_if_under_zero);

    auto t2 = std::chrono::high_resolution_clock::now();

    auto int_time = std::chrono::duration_cast<std::chrono::microseconds>(t1-t0).count();
    auto byte_time = std::chrono::duration_cast<std::chrono::microseconds>(t2-t1).count();

    std::cout << "totals    : " << less_zero_32 << ", " << less_zero_8 << std::endl;
    std::cout << "int time  : " << int_time << "us" << std::endl;
    std::cout << "byte time : " << byte_time << "us" << std::endl;
}

results with apple clang, -O3:

totals    : 9962877, 9962877
int time  : 6644us
byte time : 6035us

maybe the byte time is less because we have to traverse less memory.

Answer 2

There is only one way to find out - write a test program and find out.

The x64 architecture instruction set is effectively a complex language which is interpreted at run time into a series of much simpler instructions. The x64 instruction set is very stable, but those simpler instructions differ from chip to chip. It is thus not possible to say with any certainty whether the result is slower or not.

In practise, I would be very surprised is there was a measurably difference.

Is comparison of uint8_t values in 64-bit Intel architecture slower than comparison of uint32_t?

Question

2 answers

solution1
0 ACCPTED 2016-05-26 20:40:54

solution2
-3 2016-05-26 19:19:32

Is comparison of uint8_t values in 64-bit Intel architecture slower than comparison of uint32_t?

Question

2 answers

solution1 0 ACCPTED 2016-05-26 20:40:54

solution2 -3 2016-05-26 19:19:32

solution1
0 ACCPTED 2016-05-26 20:40:54

solution2
-3 2016-05-26 19:19:32