Windows上的pthread和多核

Question

my question relates to the pthread library and making use of the multicore system. 我的问题与pthread库以及利用多核系统有关。 The system seems to improve under the proper parameters and for small data sizes, the most improvement being around 65000. The data suggests that when you increase the threads it begins to decrease the time it takes but then increases shortly afterward. 在适当的参数下和较小的数据量下，系统似乎有所改进，最大的改进是在65000左右。数据表明，当您增加线程数时，它开始减少所需的时间，但此后不久就增加了。 when thread number = 1,2,4 it might slowly increase and sometimes 8, but then 16 the time begins decreasing back again. 当线程数= 1,2,4时，它可能会缓慢增加，有时是8，但是到16时，时间又开始减少。 In large data sizes there is no improvement and all the times remain fairly close together. 在大数据量中，没有任何改善，并且所有时间都保持相当接近。 If someone could tell me if something is forcing my threads to act sequentially or another issue that would be awesome. 如果有人可以告诉我是否有某些事情迫使我的线程按顺序执行操作，或者其他问题会令人敬畏。

heres the data 继承数据

1395525080 0 num thread: 1 data size: 1024 0 1395525080
1395525080 0 num thread: 2 data size: 1024 0 1395525080
1395525080 0 num thread: 4 data size: 1024 0 1395525080
1395525080 15 num thread: 8 data size: 1024 0 1395525080
1395525080 47 num thread: 16 data size: 1024 0 1395525080
1395525080 31 num thread: 32 data size: 1024 0 1395525080
1395525080 16 num thread: 1 data size: 4096 0 1395525080
1395525080 0 num thread: 2 data size: 4096 0 1395525080
1395525080 0 num thread: 4 data size: 4096 0 1395525080
1395525080 15 num thread: 8 data size: 4096 0 1395525080
1395525080 78 num thread: 16 data size: 4096 0 1395525080
1395525080 31 num thread: 32 data size: 4096 0 1395525080
1395525080 140 num thread: 1 data size: 65536 0 1395525080
1395525081 156 num thread: 2 data size: 65536 0 1395525081
1395525081 109 num thread: 4 data size: 65536 0 1395525081
1395525081 94 num thread: 8 data size: 65536 0 1395525081
1395525081 93 num thread: 16 data size: 65536 0 1395525081
1395525081 187 num thread: 32 data size: 65536 0 1395525082
1395525082 171 num thread: 1 data size: 75536 0 1395525082
1395525082 172 num thread: 2 data size: 75536 0 1395525082
1395525082 141 num thread: 4 data size: 75536 0 1395525083
1395525083 109 num thread: 8 data size: 75536 0 1395525083
1395525083 140 num thread: 16 data size: 75536 0 1395525083
1395525083 234 num thread: 32 data size: 75536 0 1395525084
1395525084 203 num thread: 1 data size: 85536 0 1395525084
1395525084 203 num thread: 2 data size: 85536 0 1395525084
1395525084 172 num thread: 4 data size: 85536 0 1395525085
1395525085 202 num thread: 8 data size: 85536 0 1395525085
1395525085 125 num thread: 16 data size: 85536 0 1395525085
1395525085 187 num thread: 32 data size: 85536 0 1395525086
1395525086 125 num thread: 1 data size: 55536 0 1395525086
1395525086 109 num thread: 2 data size: 55536 0 1395525086
1395525086 141 num thread: 4 data size: 55536 0 1395525086
1395525086 78 num thread: 8 data size: 55536 0 1395525086
1395525087 140 num thread: 16 data size: 55536 0 1395525087
1395525087 156 num thread: 32 data size: 55536 0 1395525087
1395525120 153271 num thread: 1 data size: 70000000 153 1395525274
1395525398 152630 num thread: 2 data size: 70000000 152 1395525551
1395525675 154846 num thread: 4 data size: 70000000 154 1395525830
1395525956 153988 num thread: 8 data size: 70000000 153 1395526110
1395526236 153956 num thread: 16 data size: 70000000 153 1395526390
1395526515 157935 num thread: 32 data size: 70000000 157 1395526673

heres the code, it does a traditional bucket sort, i have two other similar ones with similar data that also do bucket sorts, the sequential code generates almost exactly the same values. 这是代码，它执行传统的存储桶排序，我还有另外两个具有相似数据的相似存储桶，也执行存储桶排序，顺序代码生成几乎完全相同的值。

struct bucket
{
    std::vector<int> data;

} ;


void *sort_bucket(void *unsorted_bucket);
int _tmain(int argc, _TCHAR* argv[])
{
    int array_N[] = {1024, 4096, 65536,75536,85536,55536, 70000000, 16777216};
    int array_number_of_threads[] = {1, 2, 4, 8, 16, 32};
    std::vector<int> N;
    std::vector<int> number_of_threads;
    number_of_threads.assign(array_number_of_threads, array_number_of_threads+6);
    N.assign(array_N, array_N+7);

    for(int size_index = 0; size_index < N.size(); size_index++)
    {
        for(int thread_index = 0; thread_index < number_of_threads.size(); thread_index++)
        {
            std::vector<int> unsorted_data;
            std::vector<int> sorted_data;
            std::vector<std::thread> thread_array;
            std::vector<bucket> buckets;

            std::vector<pthread_t> thread;

            while(buckets.size() < number_of_threads[thread_index]){ // checks against the number of threads and creates the number of buckets
                bucket new_bucket;
                pthread_t new_thread;
                buckets.push_back(new_bucket);
                thread.push_back(new_thread);
            }

            for(int index = 0; index < N[size_index]; index++) // gathers the data
            {
                unsorted_data.push_back(rand() % N[size_index]);
            }

            clock_t t = 0;
            t = clock();
            time_t start = 0;
            time_t end = 0;

            time(&start);
            std::cout << start << " ";

            int difference = N[size_index]/number_of_threads[thread_index];
            int placeholder = 0;
            for(int index = 0; index < N[size_index]; index++) {//calculates which bucket the data belong in and places the data in that bucket
                //std::cout << unsorted_data[index] << " " << difference << " ";
                placeholder = unsorted_data[index]/difference;
                //std::cout << placeholder << std::endl;
                buckets[placeholder].data.push_back(unsorted_data[index]);
            }
            for(int index = 0; index < number_of_threads[thread_index]; index++){ // sends the data to the threads
                //thread_array.push_back(std::thread(sort_bucket ,buckets[index]));
                pthread_create(&thread[index],
                               NULL,
                               sort_bucket ,
                               (void*) &buckets[index].data);
            }
            // bring the data back to the root process
            for(int index = 0; index < number_of_threads[thread_index]; index++)        {
                void *data;
                struct bucket *ret_bucket;
                pthread_join(thread[index],(void**) &data);
                ret_bucket = (struct bucket *) data;
                sorted_data.insert(sorted_data.end(), ret_bucket->data.begin(), ret_bucket->data.end());
                //sorted_data.assign(ret_bucket->data.begin(), ret_bucket->data.end());
            }
            /*
             for(int index = 0; index < sorted_data.size(); index++)
             {
             std::cout << sorted_data[index] << " ";
             }
             */

            t = clock() - t;
            std::cout << t << " ";
            t = t/CLOCKS_PER_SEC;
            std::cout << "num thread: " << number_of_threads[thread_index] << " ";
            std::cout << "data size: " << N[size_index] << " ";
            std::cout << t << " ";
            time(&end);
            std::cout << end << std::endl;



            sort(unsorted_data.begin(), unsorted_data.end());

            for(int index = 0; index < unsorted_data.size(); index++)
            {
                if(unsorted_data[index] != sorted_data[index])
                {
                    std::cout << "data sorting failed" << std::endl;
                }
            }
        }
    }
    int placeholder;
    std::cin >> placeholder;
    return 0;
}

void *sort_bucket(void *unsorted_bucket)
{  
    bucket *temp_sorted_bucket = (struct bucket *) unsorted_bucket;  
    std::sort(temp_sorted_bucket->data.begin(), temp_sorted_bucket->data.end()); 

    /*for(int index = 0; index < temp_sorted_bucket->data.size(); index++)
     {
     std::cout << temp_sorted_bucket->data.at(index) << " ";
     }*/
    pthread_exit(temp_sorted_bucket);
    return 0; 
}

Answer 1

Remember that your threads are limited by the number of physical cores on your CPU. 请记住，您的线程受CPU上物理内核数量的限制。 Once you hit the limit, it must use resources to switch between threads on the same core, which takes time. 达到极限后，它必须使用资源在同一核心上的线程之间切换，这需要时间。 For example, an i3 processor has 2 physical cores with hyperthreading that provides 4 virtual cores on the CPU, so anything past 4 threads will often result in no benefit. 例如，一个i3处理器具有2个带有超线程的物理内核，这些超线程在CPU上提供4个虚拟内核，因此，超过4个线程的任何操作通常都不会带来任何好处。

Windows上的pthread和多核

问题描述

1 个解决方案

解决方案1
0 2014-03-23 02:46:47

Windows上的pthread和多核

问题描述

1 个解决方案

解决方案1 0 2014-03-23 02:46:47

解决方案1
0 2014-03-23 02:46:47