简体   繁体   English

在Mac OS X Lion上使用OpenMP编译失败(memcpy和SSE内部函数)

[英]Compilation fails with OpenMP on Mac OS X Lion (memcpy and SSE intrinsics)

I have stumbled upon the following problem. 我偶然发现了以下问题。 The below code snippet does not link on Mac OS X with any Xcode I tried (4.4, 4.5) 以下代码段未在Mac OS X上与我尝试过的任何Xcode链接(4.4、4.5)

#include <stdlib.h>
#include <string.h>
#include <emmintrin.h>

int main(int argc, char *argv[])
{
  char *temp;
#pragma omp parallel
  {
    __m128d v_a, v_ar;
    memcpy(temp, argv[0], 10);
    v_ar = _mm_shuffle_pd(v_a, v_a, _MM_SHUFFLE2 (0,1));
  }
}

The code is just provided as an example and would segfault when you run it. 该代码仅作为示例提供,在运行时会出现段错误。 The point is that it does not compile. 关键是它不能编译。 The compilation is done using the following line 使用以下行完成编译

/Applications/Xcode.app/Contents/Developer/usr/bin/gcc test.c -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.7.sdk -mmacosx-version-min=10.7 -fopenmp

 Undefined symbols for architecture x86_64:
"___builtin_ia32_shufpd", referenced from:
    _main.omp_fn.0 in ccJM7RAw.o
"___builtin_object_size", referenced from:
    _main.omp_fn.0 in ccJM7RAw.o
ld: symbol(s) not found for architecture x86_64
collect2: ld returned 1 exit status

The code compiles just fine when not using the -fopenmp flag to gcc . 不对 gcc使用-fopenmp标志时,代码编译就很好。 Now, I googled around and found a solution for the first problem connected with memcpy , which is adding -fno-builtin , or -D_FORTIFY_SOURCE=0 to gcc arguments list. 现在,我四处搜寻,找到了与memcpy有关的第一个问题的解决方案,该解决方案在gcc参数列表中添加了-fno-builtin-D_FORTIFY_SOURCE=0 I did not manage to solve the second problem (sse intrinsic). 我没有解决第二个问题(sse固有的)。

Can anyone help me to solve this? 谁能帮我解决这个问题? The questions: 问题:

  • most importantly: how to get rid of the "___builtin_ia32_shufpd" error? 最重要的是:如何摆脱“ ___builtin_ia32_shufpd”错误?
  • what exactly is the reason for the memcpy problem, and what does the -D_FORTIFY_SOURCE=0 flag eventually do? memcpy问题的确切原因是什么, -D_FORTIFY_SOURCE=0标志最终会做什么?

This is a bug in the way Apple's LLVM-backed GCC ( llvm-gcc ) transforms OpenMP regions and handles calls to the built-ins inside them. 这是Apple的LLVM支持的GCC( llvm-gcc )转换OpenMP区域并处理对其内部内置调用的方式中的错误。 The problem can be diagnosed by examining the intermediate tree dumps (obtainable by passing -fdump-tree-all argument to gcc ). 可以通过检查中间树转储来诊断问题(可通过将-fdump-tree-all参数传递给gcc )。 Without OpenMP enabled the following final code representation is generated (from the test.c.016t.fap ): 如果未启用OpenMP,则将生成以下最终代码表示形式(从test.c.016t.fap ):

main (argc, argv)
{
  D.6544 = __builtin_object_size (temp, 0);
  D.6545 = __builtin_object_size (temp, 0);
  D.6547 = __builtin___memcpy_chk (temp, D.6546, 10, D.6545);
  D.6550 = __builtin_ia32_shufpd (v_a, v_a, 1);
}

This is a C-like representation of how the compiler sees the code internally after all transformations. 这是类似于C的表示,表示在所有转换之后编译器如何内部查看代码。 This is what is then gets turned into assembly instructions. 然后将其变成汇编指令。 (only those lines that refer to the built-ins are shown here) (此处仅显示引用内置代码的那些行)

With OpenMP enabled the parallel region is extracted into own function, main.omp_fn.0 : 启用OpenMP后,并行区域将提取到自己的函数main.omp_fn.0

main.omp_fn.0 (.omp_data_i)
{
  void * (*<T4f6>) (void *, const <unnamed type> *, long unsigned int, long unsigned int) __builtin___memcpy_chk.21;
  long unsigned int (*<T4f5>) (const <unnamed type> *, int) __builtin_object_size.20;
  vector double (*<T6b5>) (vector double, vector double, int) __builtin_ia32_shufpd.23;
  long unsigned int (*<T4f5>) (const <unnamed type> *, int) __builtin_object_size.19;

  __builtin_object_size.19 = __builtin_object_size;
  D.6587 = __builtin_object_size.19 (D.6603, 0);
  __builtin_ia32_shufpd.23 = __builtin_ia32_shufpd;
  D.6593 = __builtin_ia32_shufpd.23 (v_a, v_a, 1);
  __builtin_object_size.20 = __builtin_object_size;
  D.6588 = __builtin_object_size.20 (D.6605, 0);
  __builtin___memcpy_chk.21 = __builtin___memcpy_chk;
  D.6590 = __builtin___memcpy_chk.21 (D.6609, D.6589, 10, D.6588);
}

Again I have only left the code that refers to the builtins. 同样,我只剩下了引用内置函数的代码。 What is apparent (but the reason for that is not immediately apparent to me) is that the OpenMP code trasnformer really insists on calling all the built-ins through function pointers. 显而易见的(但原因对我而言尚不明显)是OpenMP代码trasnformer确实坚持要通过函数指针来调用所有内置函数。 These pointer asignments: 这些指针分配:

__builtin_object_size.19 = __builtin_object_size;
__builtin_ia32_shufpd.23 = __builtin_ia32_shufpd;
__builtin_object_size.20 = __builtin_object_size;
__builtin___memcpy_chk.21 = __builtin___memcpy_chk;

generate external references to symbols which are not really symbols but rather names that get special treatment by the compiler. 生成对符号的外部引用,这些符号不是真正的符号,而是经过编译器特殊处理的名称。 The linker then tries to resolve them but is unable to find any of the __builtin_* names in any of the object files that the code is linked against. 链接器然后尝试解析它们,但是在与该代码链接的任何目标文件中都找不到__builtin_*名称。 This is also observable in the assembly code that one can obtain by passing -S to gcc : 这在通过将-S传递给gcc可以获得的汇编代码中也可以观察到:

LBB2_1:
    movapd  -48(%rbp), %xmm0
    movl    $1, %eax
    movaps  %xmm0, -80(%rbp)
    movaps  -80(%rbp), %xmm1
    movl    %eax, %edi
    callq   ___builtin_ia32_shufpd
    movapd  %xmm0, -32(%rbp)

This basically is a function call that takes 3 arguments: one integer in %eax and two XMM arguments in %xmm0 and %xmm1 , with the result being returned in %xmm0 (as per the SysV AMD64 ABI function calling convention). 这基本上是一个函数调用,带有3个参数: %eax一个整数和%xmm0%xmm1两个XMM参数,结果以%xmm0返回(根据SysV AMD64 ABI函数调用约定)。 In contrast, the code generated without -fopenmp is an instruction-level expansion of the intrinsic as it is supposed to happen: 相反,不使用-fopenmp生成的代码是内在函数的指令级扩展,因为它应该发生:

LBB1_3:
    movapd  -64(%rbp), %xmm0
    shufpd  $1, %xmm0, %xmm0
    movapd  %xmm0, -80(%rbp)

What happens when you pass -D_FORTIFY_SOURCE=0 is that memcpy is not replaced by the "fortified" checking version and a regular call to memcpy is used instead. 传递-D_FORTIFY_SOURCE=0时会发生的情况是memcpy未被“加强”检查版本代替,而是使用了对memcpy的常规调用。 This eliminates the references to object_size and __memcpy_chk but cannot remove the call to the ia32_shufpd built-in. 这消除了对object_size__memcpy_chk的引用,但无法删除对内置ia32_shufpd的调用。

This is obviously a compiler bug. 这显然是编译器错误。 If you really really really must use Apple's GCC to compile the code, then an interim solution would be to move the offending code to an external function as the bug apparently only affects code that gets extracted from parallel regions: 如果您确实真的必须使用Apple的GCC来编译代码,那么一个临时解决方案是将有问题的代码移至外部函数,因为该错误显然仅影响从parallel区域提取的代码:

void func(char *temp, char *argv0)
{
   __m128d v_a, v_ar;
   memcpy(temp, argv0, 10);
   v_ar = _mm_shuffle_pd(v_a, v_a, _MM_SHUFFLE2 (0,1));
}

int main(int argc, char *argv[])
{
  char *temp;
#pragma omp parallel
  {
    func(temp, argv[0]);
  }
}

The overhead of one additional function call is neglegible compared to the overhead of entering and exiting the parallel region. 与进入和退出parallel区域的开销相比,一个额外的函数调用的开销可以忽略不计。 You can use OpenMP pragmas inside func - they will work because of the dynamic scoping of the parallel region. 您可以在func内使用OpenMP编译指示-由于parallel区域的动态作用域,它们可以工作。

May be Apple would provide a fixed compiler in the future, may they won't, given their commitment to replacing GCC with Clang. 考虑到他们承诺用Clang代替GCC,也许苹果将来会提供固定的编译器,也许他们不会。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM