简体   繁体   中英

Parallel sum of elements in a large Array

I have program that sums the elements in a very large array. I want to parallelize this sum.

#define N = some_very_large_no; // say 1e12
float x[N]; // read from a file
float sum=0.0;
main()
{

for (i=0, i<N, i++)

sum=sum+x[i];

}

How can I parallelize this sum using threads (c/c++/Java any code example is fine)? How many threads should I use if there is 8 cores in the machine for optimal performance?

EDIT: N may be really large ( larger than 1e6 actually) and varies based on the file size I read the data from. The file is in the order of GBs.

Edit: N is changed to a large value (1e12 to 1e16)

In Java you can write

int cpus = Runtime.getRuntime().availableProcessors();
// would keep this of other tasks as well.
ExecutorService service = Executors.newFixedThreadPool(cpus);

float[] floats = new float[N];

List<Future<Double>> tasks = new ArrayList<>();
int blockSize = (floats.length + cpus - 1) / cpus;
for (int i=0, i < floats.length, i++) {
    final start = blockSize * i;
    final end = Math.min(blockSize * (i+1), floats.length);
    tasks.add(service.submit(new Callable<Double>() {
        public Double call() {
            double d= 0;
            for(int j=start;j<end;j++)
                d += floats[j];
            return d;
        }
     });
}
double sum = 0;
for(Future<Double> task: tasks)
    sum += task.get();

As WhozCraig mentions, it is likely that one million floats isn't enough to need multiple threads, or you could find that your bottle neck is how fast you can load the array from main memory (a single threaded resource) In any case, you can't assume it will be faster by the time you include the cost getting the data.

You say that the array comes from a file. If you time the different parts of the program, you'll find that summing up the elements takes a negligible amount of time compared to how long it takes to read the data from disk. From Amdahl's Law it follows that there is nothing to be gained by parallelising the summing up.

If you need to improve performance, you should focus on improving the I/O throughput.

you can use many threads(more than cores). But no of threads & its performance depends on ur algorithm as how they are working. As array length is 100000 then create x thread & each will calculate arr[x] to arr[x+limit]. where u have to set limit so that no overlapping with other thread & no element should remain un-used. thread creation:

   pthread_t tid[COUNT];
    int i = 0;
        int err;
        while (i < COUNT) 
        {
            void *arg;
            arg = x; //pass here a no which will tell from where this thread will use arr[x]
            err = pthread_create(&(tid[i]), NULL, &doSomeThing, arg);
            if (err != 0)
                printf("\ncan't create thread :[%s]", strerror(err));
            else
            {
                //printf("\n Thread created successfully\n");
            }

            i++;
        }
       // NOW CALCULATE....
        for (int i = 0; i < COUNT; i++) 
        {
            pthread_join(tid[i], NULL);
        }
}

void* doSomeThing(void *arg) 
{
    int *x;
    x = (int *) (arg);
   // now use this x to start the array sum from arr[x] to ur limit which should not overlap to other thread
}

Use divide and conquer algorithm

  • Divide the array into 2 or more (keep dividing recursively until you get an array with manageable size)
  • Start computing the sum for the sub arrays (divided arrays) (using separate threads )
  • Finally add the sum generated (from all the threads) for all sub arrays together to produce final result

As others have said, the time-cost of reading the file is almost certainly going to be much larger than that of calculating the sum. Is it a text file or binary? If the numbers are stored as text, then the cost of reading them can be very high depending on your implementation.

You should also be careful adding a large number of floats. Because of their limited precision, small values late in the array may not contribute to the sum. Think about at least using a double to accumulate the values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM