简体   繁体   中英

How to update all the elements in a double array in X86?

I am a newbie of X86 and I am stuck on updating a double array using the values of another double array. The following code is my function and I want to use inline assembly to replace the piece of code inside the loop. I have attached the error message below. Can anyone helps me to point out my errors? I am confused about the error messages and don't know how to revise it.

static inline void update(double * x,double * y,double * z,double * vx,
        double * vy,double * vz,uint32_t size){
        for (uint32_t i=0;i<size;++i){
            x[i] = x[i] + vx[i];
            y[i] = y[i] + vy[i];
            z[i] = z[i] + vz[i];
        }
}
uint32_t counter = 0;
__asm__ __volatile__ (  
    "loop: \n\t" 
    "faddq (%4), (%1)\n\t"
    "faddq (%5), (%2)\n\t"
    "faddq (%6), (%3)\n\t"
    "addq $8, %1\n\t"
    "addq $8, %2\n\t"
    "addq $8, %3\n\t"
    "addq $8, %4\n\t"
    "addq $8, %5\n\t"
    "addq $8, %6\n\t"
    "incq %0\n\t"
    "cmp %0, %7\n\t"
    "jne loopb"
    : "+r"(counter)
    : "r" (x),"r" (y),"r"(z),"r"(vx),"r"(vy),"r"(vz),"r"(size) 
    : "memory", "cc");

Error Messages:

update_locations_ass.c:150:15: error: invalid instruction mnemonic 'faddq'
        "loop: \n\t" 
                 ^
<inline asm>:2:2: note: instantiated into assembly here
        faddq (%rdi), (%rcx)
        ^~~~~
update_locations_ass.c:151:25: error: invalid instruction mnemonic 'faddq'
        "faddq (%4), (%1)\n\t"
                           ^
<inline asm>:3:2: note: instantiated into assembly here
        faddq (%r8), (%rdx)
        ^~~~~
update_locations_ass.c:152:28: error: invalid instruction mnemonic 'faddq'
        "faddq (%5), (%2)\n\t"
                           ^
<inline asm>:4:2: note: instantiated into assembly here
        faddq (%r9), (%rsi)
        ^~~~~
update_locations_ass.c:159:23: error: invalid operand for instruction
        "addq $8, %6\n\t"
                      ^
<inline asm>:11:7: note: instantiated into assembly here
        incq %eax

Compiler version: Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn) Target: x86_64-apple-darwin14.0.0 Thread model: posix

I'm equally confused here. What is faddq , and where did you get it from? Is it supposed to be fadd ? You can't use two memory operands with fadd anyway, so the code looks completely incorrect. If you're curious about the correct way to do it, try compiling with -S and -O2 so you can look at optimized compiler output.

If you want to get a faster version of the function, easiest to just do it in C anyway. Assuming that the arrays don't overlap, here is a much faster version:

// Assuming x and vx do not overlap
void update1(double *restrict x, const double *restrict vx, unsigned count) {
    for (unsigned i = 0; i < count; i++) {
        x[i] += vx[i];
    }
}

void update(/* ... */) {
    update1(x, vx, count);
    update1(y, vy, count);
    update1(z, vz, count);
}

If you compile with -O3 , the compiler will generate code that uses addpd , depending on your compilation target. This is going to be miles better than anything you could write yourself using the x87 FPU instructions.

These simple functions--just adding arrays to other arrays--are very easy for the compiler to optimize, so unless you are teaching yourself assembly language, just let the compiler do it for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM