Performance of OpenMP Parallel Programming in C

Question

I wrote a C program for Pi computation using OpenMP getting help from a book. I believe the performance of this program will depend on the processors used.

In my case, I used the environment variable to check the performance of parallelism by increasing the number of processors or threads (I am not sure what is correct ... please correct me)

OMP_NUM_THREADS

I have a quad core processor, so I used (where no_of_threads is changed from 1 to 10):

$ export OMP_NUM_THREADS=no_of_threads

the performance on running the program is:

1 --- 0m11.036s

2 --- 0m5.554s

3 --- 0m3.800s

4 --- 0m3.166s

5 --- 0m3.376s

8 --- 0m3.042s

10 --- 0m2.960s

15 --- 0m2.957s

I can understand the performance increase until 4, as there are 4 procesors on the system. But I am unable to understand the increase in performance even after the threads are more than 4. I am aware of the fact that each increased thread has an overhead, so why does the performance still increasing..

Can someone please explain this to me in detail.

Answer 1

You probably have a processor that supports hardware threads (Intel calls this hyper-threading ).

What this basically means is that your cores each have two instruction caches and can thus execute two interweaving threads more efficiently than usually. This is especially noticeable if the threads often have to wait for memory: usually, a core just stalls while waiting for memory ¹ . A core that supports hyper-threading can instead execute instructions from the other thread during that wait.

¹ Not taking into account instruction reordering and prefetching.

Performance of OpenMP Parallel Programming in C

Question

1 answers

solution1
3 ACCPTED 2011-01-16 11:33:02

Performance of OpenMP Parallel Programming in C

Question

1 answers

solution1 3 ACCPTED 2011-01-16 11:33:02

solution1
3 ACCPTED 2011-01-16 11:33:02