简体   繁体   English

确定如何实现OpenMP原子指令

[英]Determining how the OpenMP atomic directive is implemented

A compiler that implements the OpenMP standard may, but is not obliged to, exploit special hardware instructions to make certain memory updates following a #pragma omp atomic directive atomic, avoiding expensive locks. 实现OpenMP标准的编译器可以(但没有义务)利用特殊的硬件指令在#pragma omp atomic指令原子之后进行某些内存更新,从而避免昂贵的锁。 According to http://gcc.gnu.org/onlinedocs/gccint/OpenMP.html , GCC implements an atomic update as follows: 根据http://gcc.gnu.org/onlinedocs/gccint/OpenMP.html,GCC实现了原子更新,如下所示:

Whenever possible, an atomic update built-in is used. 只要有可能,就会使用内置的原子更新。 If that fails, a compare-and-swap loop is attempted. 如果失败,则尝试进行比较和交换循环。 If that also fails, a regular critical section around the expression is used. 如果失败,则使用表达式周围的常规临界区。

  1. How can I determine which of the three is actually used on a given machine and GCC version? 如何确定给定机器和GCC版本中实际使用的三个中的哪一个? Is there some verbosity option for GCC that I can set to find out without having to profile my program or look a the generated bytecode? 是否有一些GCC的详细选项,我可以设置找出,而不必分析我的程序或查看生成的字节码?

  2. Is there some documentation listing CPUs/architectures that provide atomic addition/increment/etc instructions, allowing me to predict the outcome for a given machine? 是否有一些文档列出了提供原子添加/增量/等指令的CPU /架构,允许我预测给定机器的结果?

I'm using GCC versions 4.2 to 4.6 on a variety of different machines. 我在各种不同的机器上使用GCC版本4.2到4.6。

You may look at the intermediate tree representations with the -fdump-tree-all option. 您可以使用-fdump-tree-all选项查看中间树表示。 Given that option, GCC writes a set of files at several intermediate steps and one can observe the successive transformations applied to the tree. 给定该选项,GCC在几个中间步骤中写入一组文件,并且可以观察应用于树的连续变换。 The .ompexp file is of particular interest here, since it contains the tree just after the OpenMP expressions were expanded into their concrete implementations. .ompexp文件在这里特别有用,因为它在OpenMP表达式扩展到具体实现之后就包含了树。

For example, the block inside the parallel region in the following simple code: 例如,以下简单代码中parallel区域内的块:

int main (void)
{
    int i = 0;

    #pragma omp parallel
    {
       #pragma omp atomic
       i++;
    }

    return i;
}

is transformed by GCC 4.7.2 on 64-bit Linux into: 由64位Linux上的GCC 4.7.2转换为:

;; Function main._omp_fn.0 (main._omp_fn.0, funcdef_no=1, decl_uid=1712, cgraph_uid=1)

main._omp_fn.0 (struct .omp_data_s.0 * .omp_data_i)
{
  int D.1726;
  int D.1725;
  int i [value-expr: *.omp_data_i->i];
  int * D.1723;
  int * D.1722;

<bb 2>:
  D.1722_2 = .omp_data_i_1(D)->i;
  D.1723_3 = &*D.1722_2;
  __atomic_fetch_add_4 (D.1723_3, 1, 0);
  return;

}

which finally ends into: 最终结束为:

00000000004006af <main._omp_fn.0>:
  4006af:       55                      push   %rbp
  4006b0:       48 89 e5                mov    %rsp,%rbp
  4006b3:       48 89 7d f8             mov    %rdi,-0x8(%rbp)
  4006b7:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4006bb:       48 8b 00                mov    (%rax),%rax
  4006be:       f0 83 00 01             lock addl $0x1,(%rax)
  4006c2:       5d                      pop    %rbp
  4006c3:       c3                      retq

As for the second question, it might also depend on how GCC was built. 至于第二个问题,它可能还取决于GCC是如何建立的。

GCC will define macros GCC将定义宏

#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 1

if the respective operations are available. 如果相应的操作可用。

In general, one can always expect to have compare and swap (either in the form of a CAS or LL/SC) on any architecture, which supports multiple processors. 通常,人们总是希望在任何支持多处理器的架构上进行比较和交换(以CAS或LL / SC的形式)。

In addition, on x86 there's atomic increment and decrement. 另外,在x86上有原子增量和减量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM