简体   繁体   中英

Two-threaded app is slower than single-threaded on C++ (VC++ 2010 Express). How to solve?

I have some program that allocate memory a lot, I hoped to boost it's speed by splitting task on threads, but it made my program only slower.

I made this minimal example that has nothing to do with my real code aside of the fact it allocate memory in different threads.

class ThreadStartInfo
{
public:
    unsigned char *arr_of_5m_elems;
    bool TaskDoneFlag;

    ThreadStartInfo()
    {
        this->TaskDoneFlag = false;
        this->arr_of_5m_elems = NULL;
    }

    ~ThreadStartInfo()
    {
        if (this->arr_of_5m_elems)
            free(this->arr_of_5m_elems);
    }
};

unsigned long __stdcall CalcSomething(void *tsi_ptr)
{
    ThreadStartInfo *tsi = (ThreadStartInfo*)tsi_ptr;

    for (int i = 0; i < 5000000; i++)
    {
        double *test_ptr = (double*)malloc(tsi->arr_of_5m_elems[i] * sizeof(double));
        memset(test_ptr, 0, tsi->arr_of_5m_elems[i] * sizeof(double));
        free(test_ptr);
    }

    tsi->TaskDoneFlag = true;
    return 0;
}

void main()
{
    ThreadStartInfo *tsi1 = new ThreadStartInfo();
    tsi1->arr_of_5m_elems = (unsigned char*)malloc(5000000 * sizeof(unsigned char));
    ThreadStartInfo *tsi2 = new ThreadStartInfo();
    tsi2->arr_of_5m_elems = (unsigned char*)malloc(5000000 * sizeof(unsigned char));
    ThreadStartInfo **tsi_arr = (ThreadStartInfo**)malloc(2 * sizeof(ThreadStartInfo*));
    tsi_arr[0] = tsi1;
    tsi_arr[1] = tsi2;

    time_t start_dt = time(NULL);
    CalcSomething(tsi1);
    CalcSomething(tsi2);
    printf("Task done in %i seconds.\n", time(NULL) - start_dt);
    //--

    tsi1->TaskDoneFlag = false;
    tsi2->TaskDoneFlag = false;
    //--

    start_dt = time(NULL);
    unsigned long th1_id = 0;
    void *th1h = CreateThread(NULL, 0, CalcSomething, tsi1, 0, &th1_id);
    unsigned long th2_id = 0;
    void *th2h = CreateThread(NULL, 0, CalcSomething, tsi2, 0, &th2_id);

    retry:
    for (int i = 0; i < 2; i++)
        if (!tsi_arr[i]->TaskDoneFlag)
        {
            Sleep(100);
            goto retry;
        }

    CloseHandle(th1h);
    CloseHandle(th2h);

    printf("MT Task done in %i seconds.\n", time(NULL) - start_dt);
}

It prints me such results:

Task done in 16 seconds.
MT Task done in 19 seconds.

And... I didn't expected slow down. Is there anyway to make memory allocations faster in multiple threads?

Apart from some undefined behavior due to lack of synchronization on TaskDoneFlag , all the threads are doing is calling malloc / free repeatedly.

The Visual C++ CRT heap is single-threaded 1 , as malloc / free delegate to HeapAlloc / HeapFree which execute in a critical section (only one thread at a time). Calling them from more than one thread at a time will never be faster than a single thread, and often slower due to the lock contention overhead.

Either reduce allocations in threads or switch to another memory allocator, like jemalloc or tcmalloc.


1 See this note for HeapAlloc :

Serialization ensures mutual exclusion when two or more threads attempt to simultaneously allocate or free blocks from the same heap. There is a small performance cost to serialization, but it must be used whenever multiple threads allocate and free memory from the same heap. Setting the HEAP_NO_SERIALIZE value eliminates mutual exclusion on the heap. Without serialization, two or more threads that use the same heap handle might attempt to allocate or free memory simultaneously, likely causing corruption in the heap.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM