c++11 原子<int> ++ 比 std::mutex protected int++ 慢得多，为什么？</int>

Question

To compare the performance difference between std::atomic<int> ++ and std::mutex protected int ++, I have this test program:为了比较std::atomic<int> ++ 和std::mutex protected int ++ 之间的性能差异，我有这个测试程序：

#include <iostream>
#include <atomic>
#include <mutex>
#include <thread>
#include <chrono>
#include <limits>
using namespace std;
#ifndef INT_MAX
const int INT_MAX = numeric_limits<std::int32_t>::max();
const int INT_MIN = numeric_limits<std::int32_t>::min();
#endif
using std::chrono::steady_clock;
const size_t LOOP_COUNT = 12500000;
const size_t THREAD_COUNT = 8;
int intArray[2] = { 0, INT_MAX };
atomic<int> atomicArray[2];
void atomic_tf() {//3.19s
    for (size_t i = 0; i < LOOP_COUNT; ++i) {
        atomicArray[0]++;
        atomicArray[1]--;
    }
}
mutex m;
void mutex_tf() {//0.25s
    m.lock();
    for (size_t i = 0; i < LOOP_COUNT; ++i) {
        intArray[0]++;
        intArray[1]--;
    }
    m.unlock();
}
int main() {
    {
        atomicArray[0] = 0;
        atomicArray[1] = INT_MAX;
        thread tp[THREAD_COUNT];
        steady_clock::time_point t1 = steady_clock::now();
        for (size_t t = 0; t < THREAD_COUNT; ++t) {
            tp[t] = thread(atomic_tf);
        }
        for (size_t t = 0; t < THREAD_COUNT; ++t) {
            tp[t].join();
        }
        steady_clock::time_point t2 = steady_clock::now();
        cout << (float)((t2 - t1).count()) / 1000000000 << endl;
    }
    {
        thread tp[THREAD_COUNT];
        steady_clock::time_point t1 = steady_clock::now();
        for (size_t t = 0; t < THREAD_COUNT; ++t) {
            tp[t] = thread(mutex_tf);
        }
        for (size_t t = 0; t < THREAD_COUNT; ++t) {
            tp[t].join();
        }
        steady_clock::time_point t2 = steady_clock::now();
        cout << (float)((t2 - t1).count()) / 1000000000 << endl;
    }
    return 0;
}

I ran this program on windows/linux many times (compiled with clang++14, g++12), basically same result.我在windows/linux上多次运行这个程序（用clang++14、g++12编译），结果基本相同。

atomic_tf will take 3+ seconds atomic_tf将需要 3 秒以上
mutex_tf will take 0.25+ seconds. mutex_tf将花费 0.25+ 秒。

Almost 10 times of performance difference.几乎10倍的性能差异。

My question is, if my test program is valid, then does it indicate that using atomic variable is much more expensive compared with using mutex + normal variables?我的问题是，如果我的测试程序是有效的，那么它是否表明使用原子变量比使用互斥锁 + 普通变量要贵得多？

How does this performance difference come from?这种性能差异是怎么来的？ Thanks!谢谢！

Answer 1

Your mutex version locks the mutex once, then does 12500000 iterations without paying any additional cost for thread synchronization mechanism.您的互斥锁版本将互斥锁锁定一次，然后进行12500000次迭代，而无需为线程同步机制支付任何额外费用。

In your atomic version you pay the cost of the atomic synchronization for every increment, and every decrement of the atomic value (each happens 12500000 times).在您的原子版本中，您为每次递增和每次递减原子值（每次发生12500000次）支付原子同步的成本。

Therefore your test does not really compare the performance of mutex vs atomic .因此，您的测试并没有真正比较mutex 与 atomic的性能。

If you want to do that, you can try to lock and unlock the mutex for every increment or decrement of the value.如果您想这样做，您可以尝试在值的每次递增或递减时锁定和解锁互斥锁。

Something like:就像是：

void mutex_tf() 
{
    for (size_t i = 0; i < LOOP_COUNT; ++i) 
    {
        m.lock();
        intArray[0]++;
        m.unlock(); 

        m.lock();
        intArray[1]--;
        m.unlock(); 
    }
}

c++11 原子<int> ++ 比 std::mutex protected int++ 慢得多，为什么？</int>

问题描述

1 个解决方案

解决方案1
4 2022-09-24 09:43:08

c++11 原子<int> ++ 比 std::mutex protected int++ 慢得多，为什么？</int>

问题描述

1 个解决方案

解决方案1 4 2022-09-24 09:43:08

解决方案1
4 2022-09-24 09:43:08