简体   繁体   中英

OpenMp - Loop/array bounds for each thread

Is there an ICV (internal control variable) or something similar to query the upper and lower bounds of a loop in OpenMP?

The following calculation would give me the upper and lower bounds in some cases:

#pragma omp parallel for
for ( i = 0 ; i < n ; i++ ){
    int this_thread = omp_get_thread_num(), 
    num_threads = omp_get_num_threads();
    int lower_bound = (this_thread * n / num_threads);
    int upper_bound = ((this_thread+1) * n / num_threads) - 1;
...
}

For n=100 I would get the correct lower_bound of 0, 25, 50 and 75 and upperbound of 24, 49, 74 and 99 for the threads 0, 1, 2, 3 .

If I change n to 99 it will give me incorrect bounds.

Does the calculation of the upper and lower bounds differ for GCC and Intel or C/C++ compilers?

There is no function from the OpenMP run-time library that will give you this information. Furthermore, it will highly depend on the scheduling applied on the loop.

By default, in absence of an explicit schedule directive, the one that will be applied is compiler-dependent and unspecified by the OpenMP standard. Many compilers will use a static scheduling, but that isn't always the case, and definitely not guaranteed.

Now, just to quote the OpenMP standard about static scheduling:

When schedule(static, chunk_size) is specified, iterations are divided into chunks of size chunk_size , and the chunks are assigned to the threads in the team in a round-robin fashion in the order of the thread number.

When no chunk_size is specified, the iteration space is divided into chunks that are approximately equal in size, and at most one chunk is distributed to each thread. The size of the chunks is unspecified in this case.

As you can see, even in this simple case, if no chunk size is given and the number of threads doesn't evenly divide the number of iterations, you cannot determine reliably the lower and upper bounds of each threads' iterations.

If you define properly the size of chunks however, you should be able to compute reliably the bounds of iterations for each threads.

Now if your scheduling isn't static, then there's absolutely no way of inferring which thread will get what iteration, since this will be only defined at run-time.

There are no upper/lower bounds per thread. Each thread will pick the next available element, or next available chunk of elements, in the sequence. The chunk size is configurable.

Typically, an atomic increment will used internally, ie InterlockedIncrement().

Having lower/upper bounds per thread would be a really bad idea by the way. Suppose that element 27 takes 10 times longer to execute compared to the rest. Then the unlucky thread that would have to process this element would finish much later than the other threads. Specific thread can also be stalled by other CPU activity, having a fixed number of elements per thread will be very inefficient.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM