兩個線程遞增一個數字

Question

這是給我的一個測試任務，我顯然失敗了：
1.使用兩個線程遞增 integer。 線程 A 在偶數時遞增，線程 B 在奇數時遞增（對於 integer 問題，我們可以將其指定為命令行上提供的最大數字）
1a。 添加更多線程有哪些困難？ 請顯示代碼的困難。
1b。 額外的功勞——設計一個改進的解決方案，可以用許多線程進行擴展

第一次嘗試后的反饋是“沒有解決原子修改和錯誤共享”。 我試圖解決它們，但第二次嘗試沒有反饋。 我想用這個測試來學習，所以我想我會問最頂尖的專家——你。 提前致謝。 以下是第一次嘗試的header：

#include <iostream>
#include <mutex>
#include <atomic>

class CIntToInc
{
private:
 int m_nVal; //std::atomic<int> m_nVal;
 int m_nMaxVal;
public:
 CIntToInc(int p_nVal, int p_nMaxVal) : m_nVal(p_nVal), m_nMaxVal(p_nMaxVal) { }
 const int GetVal() const { return m_nVal; }
 const int GetMaxVal() const { return m_nMaxVal; }
 void operator ++() { ++m_nVal; }
};

struct COper
{
 enum class eOper { None = 0, Mutex = 1, NoMutex = 2 };
 eOper m_Oper;
public:
 friend std::istream& operator>> (std::istream &in, COper &Oper);
 bool operator == (const eOper &p_eOper) { return(m_Oper == p_eOper); }
};

以下是第一次嘗試的來源。 它包括我對解決方案為何有效的想法。 我在 MSVS2012 中編譯了代碼。

// Notes: 
// 1a.
// Since an integer cannot be an odd number and an even number at the same time, thread separation happens naturally when each thread checks the value.
// This way no additional synchronization is necessary and both threads can run at will, provided that it's all they are doing.
// It's probably not even necessary to declare the target value atomic because it changes (and thus lets the other thread increment itself) only at the last moment.
// I would still opt for making it atomic.
// Adding more threads to this setup immediately creates a problem with threads of equal condition (even or odd) stepping on each other.
// 1b.
// By using a mutex threads can cleanly separate. Many threads with the same condition can run concurrently.
// Note: there is no guarantee that each individual thread from a pool of equally conditioned threads will get to increment the number.
// For this method reading has to be inside the mutext lock to prevent a situation where a thread may see the value as incrementable, yet when it gets to it, the value has already 
// been changed by another thread and no longer qualifies.
// cout message output is separated in this approach.
// 
// The speed of the "raw" approach is 10 times faster than that of the mutex approach on an equal number of threads (two) with the mutex time increasing further as you add threads.
// Use 10000000 for the max to feel the difference, watch the CPU graph
//
// If the operation is complex and time consuming, the approach needs to be different still. The "increment" functionality can be wrapped up in a pimpl class, a copy can be made
// and "incremented". When ready, the thread will check for whether the value has changed while the operation was being performed on the copy and, if not, a fast swap under the mutex
// could be attempted. This approach is resource-intensive, but it mininuzes lock time.
//
// The approach above will work if the operation does not involve resources that cannot be easily copied (like a file to the end of which we are writing)
// When such resources are present, the algorithm probably has to implement a thread safe queue.
// END

#include "test.h"
#include <thread>

int main_test();

int main(int argc, char* argv[])
{
 main_test();
 return(0);
}

void IncrementInt2(CIntToInc &p_rIi, bool p_bIfEven, const char *p_ThreadName, std::mutex *p_pMu)
// the version that uses a mutex
// enable cout output to see thread messages
{
 int nVal(0);
 while(true) {
   p_pMu->lock();
   bool DoWork = (nVal = p_rIi.GetVal() < p_rIi.GetMaxVal());
   if(DoWork) {
     //std::cout << "Thread " << p_ThreadName << ": nVal=" << nVal << std::endl;
     if((!(nVal % 2) && p_bIfEven) || (nVal % 2 && !p_bIfEven)) {
      //std::cout << "incrementing" << std::endl;
      ++p_rIi; } }
   p_pMu->unlock();
   if(!DoWork) break;
   //if(p_bIfEven) // uncomment to force threads to execute differently
   // std::this_thread::sleep_for(std::chrono::milliseconds(10));
   }
}

void IncrementInt3(CIntToInc &p_rIi, bool p_bIfEven, const char *p_ThreadName)
// the version that does not use a mutex
// enable cout output to see thread messages. Message text output is not synchronized
{
 int nVal(0);
 while((nVal = p_rIi.GetVal()) < p_rIi.GetMaxVal()) {
   //std::cout << "Thread " << p_ThreadName << ": nVal=" << nVal << std::endl;
   if((!(nVal % 2) && p_bIfEven) || (nVal % 2 && !p_bIfEven)) {
    //std::cout << "Thread " << p_ThreadName << " incrementing" << std::endl;
    ++p_rIi; }
    }
}

std::istream& operator>> (std::istream &in, COper &Oper)
// to read operation types from cin
{
 int nVal;
 std::cin >> nVal;
 switch(nVal) {
   case 1: Oper.m_Oper = COper::eOper::Mutex; break;
   case 2: Oper.m_Oper = COper::eOper::NoMutex; break;
   default: Oper.m_Oper = COper::eOper::None; }
 return in;
}

int main_test()
{
 int MaxValue, FinalValue;
 COper Oper;
 std::cout << "Please enter the number to increment to: ";
 std::cin >> MaxValue;
 std::cout << "Please enter the method (1 - mutex, 2 - no mutex): ";
 std::cin >> Oper;

 auto StartTime(std::chrono::high_resolution_clock::now());

 if(Oper == COper::eOper::Mutex) {
   std::mutex Mu;
   CIntToInc ii(0, MaxValue);
   std::thread teven(IncrementInt2, std::ref(ii), true, "Even", &Mu);
   std::thread todd(IncrementInt2, std::ref(ii), false, "Odd", &Mu);
   // add more threads at will, should be safe
   //std::thread teven2(IncrementInt2, std::ref(ii), true, "Even2", &Mu);
   //std::thread teven3(IncrementInt2, std::ref(ii), true, "Even3", &Mu);
   teven.join();
   todd.join();
   //teven2.join();
   //teven3.join();
   FinalValue = ii.GetVal();
   }
 else if(Oper == COper::eOper::NoMutex) {
   CIntToInc ii(0, MaxValue);
   std::thread teven(IncrementInt3, std::ref(ii), true, "Even");
   std::thread todd(IncrementInt3, std::ref(ii), false, "Odd");
   teven.join();
   todd.join();
   FinalValue = ii.GetVal(); }

 std::chrono::duration<double>elapsed_seconds = (std::chrono::high_resolution_clock::now() - StartTime);
 std::cout << "main_mutex completed with nVal=" << FinalValue << " in " << elapsed_seconds.count() << " seconds" << std::endl;

 return(0);
}

對於第二次嘗試，我對 header 進行了以下更改：
制作 m_nVal std::atomic
使用原子方法來增加和檢索 m_nVal
用填充符將 m_nVal 與只讀 m_nMaxVal 分開
源文件沒有改變。 新的 header 如下。

#include <iostream>
#include <mutex>
#include <atomic>
class CIntToInc
{
private:
 int m_nMaxVal;
 char m_Filler[64 - sizeof(int)]; // false sharing prevention, assuming a 64 byte cache line
 std::atomic<int> m_nVal;

public:
 CIntToInc(int p_nVal, int p_nMaxVal) : m_nVal(p_nVal), m_nMaxVal(p_nMaxVal) { }
 const int GetVal() const { 
   //return m_nVal;
   return m_nVal.load(); // std::memory_order_relaxed);
   }
 const int GetMaxVal() const { return m_nMaxVal; }
 void operator ++() { 
   //++m_nVal;
   m_nVal.fetch_add(1); //, std::memory_order_relaxed); // relaxed is enough since we check this very variable
   }
};

struct COper
{
 enum class eOper { None = 0, Mutex = 1, NoMutex = 2 };
 eOper m_Oper;
public:
 friend std::istream& operator>> (std::istream &in, COper &Oper);
 bool operator == (const eOper &p_eOper) { return(m_Oper == p_eOper); }
};

我不知道這種方法是否從根本上是錯誤的，或者是否存在一個或多個較小的錯誤。 任何幫助都感激不盡。

Answer 1

你為什么不需要同步的推理是有缺陷的。 您確實需要同步，即使每個線程自然會交替誰是作者。 正如Pete Becker 所說，沒有同步的作者和讀者是未定義的行為。 您無法預測它將如何中斷，但有時優化器可以看到它對您的代碼做出假設，並做壞事：

這里有一個線程立即將 keep_going 設置為 false，這“應該”停止循環：

int main() {
    bool keep_going = true;
    unsigned x = 999;

    auto thr = std::thread([&]() mutable { 
        keep_going = false;  // unsync write ...
    });   

    while (keep_going) {     // ... unsync read - undefined behavior
       ++x;
    }

    thr.join();
    std::cout << x << std::endl;
}

直播： https://godbolt.org/z/P1rnf8s71

但是，它永遠不會停止與 g++ 一起運行？ 原因：循環優化器看到了幾件事：

盡管在 lambda 中使用了 keep_going，但它並沒有理由在后台線程中運行，因為沒有同步。
因此，當它進入循環時，如果 lambda 要改變它，它已經改變了。
由於沒有任何內容寫入 keep_going，因此 state 在進入循環時不會改變，因此可以將測試提升到循環之外。
同樣，由於循環無法退出，並且循環僅寫入 x，如果它沒有寫入 x，則無法觀察到，因此它消除了浪費的工作。

因此，優化器與 AS IF 一起工作，它等同於：

bool keep_going = true;
call_ordinary_function(keep_going);
if (keep_going) {
   top:
   goto top;
}

生成的程序集反映了這一點：

        call    [QWORD PTR [rax+8]]
.L7:
        cmp     BYTE PTR [rsp+31], 0
        je      .L30

.L8:
        jmp     .L8    <<<< truly infinite loop

.L30:

不是你所期望的？

但是聲明 boolean atomic會改變一切：

std::atomic<bool> keep_going = true;

現在生成的代碼是：

.L7:
        mov     ebx, 999
        jmp     .L8
.L11:
        add     ebx, 1
.L8:
        movzx   eax, BYTE PTR [rsp+31]
        test    al, al
        jne     .L11
        lea     rdi, [rsp+32]

所以現在我們看到：

x 現在遞增（因為循環可以終止，所以對x的更改在循環后可見），
我們不斷加載keep_going的值，將其讀入eax並在循環中實際檢查它。
它實際上終止了。

我希望這能讓您相信，即使您認為它沒有必要，生成的代碼也可能不是您想的那樣。

Answer 2

首先，關鍵部分（鎖定+解鎖）包含奇數/偶數檢查，並在活動循環中完成。 因此，這兩個線程將競爭性地嘗試鎖定互斥鎖，盡管只有一個線程應該這樣做。 在最壞的情況下，線程 1 增加值，然后忙鎖定+解鎖互斥鎖（以主動執行檢查），而另一個線程 2 等待很長時間才能鎖定值增加值。 這種情況遠不是理論上的，因為線程 1 通常具有互斥鎖的優先級（由於 CPU 緩存和操作系統的工作方式）。

解決這個問題的一種方法是使用條件變量。 這個想法是鎖定一個互斥體，然后增加值，然后通知下一個可以增加值的線程，然后等待被線程喚醒。 這個解決方案可以很好地擴展，但如果工作非常小，它通常會很慢，因為等待會導致一些不必要的延遲（通常是由於上下文切換）。 當線程數遠大於內核數時，這種解決方案非常有效。 當線程數量很少時（或者當有很多線程並且即將輪到線程時），可以使用對原子變量的繁忙讀取來降低此成本。

另一種解決方案是使用兩個（二進制）信號量。 一開始，一個是獲得的，一個是沒有的。 每個線程嘗試獲取自己的信號量，遞增 integer，然后釋放另一個，導致類似乒乓球的執行。

虛假分享是您第一次嘗試時遇到的最少問題。 實際上，雖然互斥體和遞增的 integer 之間可能存在錯誤共享，但這不是問題，因為互斥體保護 integer（並且還負責 Z157DB7DF5300235755198D366EC 之間的可見性）。

請注意，您可以使用lock_guard使代碼更安全且更易於閱讀。 此外，條件(!(nVal % 2) && p_bIfEven) || (nVal % 2 && !p_bIfEven) (!(nVal % 2) && p_bIfEven) || (nVal % 2 && !p_bIfEven)比它應該的要復雜得多。 考慮使用(nVal % 2) ^ p_bIfEven 。

對於第二次嘗試，尚不清楚您是否使用帶有互斥鎖的原子。 請注意，沒有必要一起使用它們。 事實上，由於原子引起的額外開銷，這是一個壞主意。 話雖如此，如果您選擇僅使用原子變量，那么您需要一個（弱） 比較和交換，以便檢查原子變量的值並以原子方式更改它。 只要線程數小於內核數（由於忙等待），此解決方案就很快。

關於第二次嘗試的虛假共享， m_Filler不足以保證沒有虛假共享（也不是很直接）。 實際上，在std::atomic之后存儲的內容可能會導致錯誤共享（ std::atomic不能保證使用一些填充來防止錯誤共享，實際上通常不會）。 您可以使用alignas(64) std::atomic<int> m_nVal; alignas(64) char padding; alignas(64) std::atomic<int> m_nVal; alignas(64) char padding; . 請注意，使用 64 取決於架構，理論上應該使用alignas(std::hardware_destructive_interference_size)代替。

兩個線程遞增一個數字

問題描述

2 個解決方案

解決方案1
1 2021-12-16 22:12:15

解決方案2
0 2021-12-16 22:41:40

兩個線程遞增一個數字

問題描述

2 個解決方案

解決方案1 1 2021-12-16 22:12:15

解決方案2 0 2021-12-16 22:41:40

解決方案1
1 2021-12-16 22:12:15

解決方案2
0 2021-12-16 22:41:40