简体   繁体   中英

copy from memory to register in c++ using assembly code

i have a problem when converting c++ program to assembly i have to do it for

here is my c++ code

for(int i=0;i<rows-4;i++,a+=4,b+=4,c+=4,d+=4,e+=4,f+=4,x+=4,o+=4){
  for(int j=0;j<cols-4;j++,a++,b++,c++,d++,e++,f++,x++,o++){
    *o=*a>*x;
    *o=*b>*x|(*o<<1);
    *o=*c>*x|(*o<<1);
    *o=*d>*x|(*o<<1);
    *o=*e>*x|(*o<<1);
    *o=*f>*x|(*o<<1);
    }
}

o is pointer for the output data while a,b,c,d,e,f and x are pointer to input data. what i want is just save the comparisons from the input data to a single variable, but the code above is not efficient when the data that being processed is big. The program need more times to save a data into memory compared to saving temporary data in register.

so what i want to do is just make this process done in register. What i've tried is i store the data that referred by x in EBX, compare EBX to ECX which hold the value referred by a (and b,c,d,e,f sequentially), save the comparison result to EAX and shift the EAX register to left so that all the comparison will be stored in one variable. after all 6 comparisons already processed the value from ECX is copied to memory.

here is what i did, my program can runs two times faster but all the values that i get i just zero. maybe i do it in a wrong way?

      __asm__(
"xorl %%eax,%%eax;"
"xorl %%ebx,%%ebx;"
"xorl %%ecx,%%ecx;"

"movl %1, %%ebx;"

//start here
"movl %2,%%ecx;"
"cmp %%ebx,%%ecx;"
"jnz .one;"
"orl $0x1,%%eax;"

".one:;"
"shll $1,%%eax;"
"movl %3,%%ecx;"
"cmp %%ebx,%%ecx;"
"jnz .two;"
"orl $0x1,%%eax;"

".two:;"
"shll $1,%%eax;"
"movl %4,%%ecx;"
"cmp %%ebx,%%ecx;"
"jnz .three;"
"orl $0x1,%%eax;"

".three:;"
"shll $1,%%eax;"
"movl %5,%%ecx;"
"cmp %%ebx,%%ecx;"
"jnz .four;"
"orl $0x1,%%eax;"

".four:"
"shll $1,%%eax;"
"movl %6,%%ecx;"
"cmp %%ebx,%%ecx;"
"jnz .five;"
"orl $0x1,%%eax;"

".five:"
"shll $1,%%eax;"
"movl %7,%%ecx;"
"cmp %%ebx,%%ecx;"
"jnz .six;"
"orl $0x1,%%eax;"

".six:"
//output
"movl %%eax,%0;"

:"=r"(sett)
:"r"((int)*x),"r"((int)*a) ,"r"((int)*b) ,"r"((int)*c) ,"r"((int)*d),"r"((int)*e),"r"((int)*f) /* input */
  );

A few options:

1) Throw away your handcrafted assembly code. You said the C code is slow, tell us by how much. I can't see how could have measured the difference in any meaningful way, as the asm version doesn't even produce the correct result. Put in another way, try asm("nop;"); , it's an even faster way to produce the incorrect result.

2) Rewrite your C code to read *x only once; keep the result in a temporary variable, and only write to *o at the end.

3) If appropriate for your semantics (and supported by your compiler) decorate your pointers with restrict / __restrict / __restrict__ (from C99, commonly available in C++ as an extension) so the compiler knows none of the input variables change when you write to *o .

4) Compilers are fairly good at unrolling loops automatically. It might require a combination of command-line options, #pragma directives, or extension/attributes.

EDIT

This is what I mean by rewriting it to use temporaries:

for(int i=0;i<rows-4;i++,a+=4,b+=4,c+=4,d+=4,e+=4,f+=4,x+=4,o+=4){
    for(int j=0;j<cols-4;j++,a++,b++,c++,d++,e++,f++,x++,o++){
        uint32_t tmp_x = *x;
        *o = (*a > tmp_x ? 0x20 : 0)
          |  (*b > tmp_x ? 0x10 : 0)
          |  (*c > tmp_x ? 0x08 : 0)
          |  (*d > tmp_x ? 0x04 : 0)
          |  (*e > tmp_x ? 0x02 : 0)
          |  (*f > tmp_x ? 0x01 : 0);
    }
}

What difference does it make? On the original version, x is read from in every single assignment. The compiler doesn't know that o and x point to different locations; in the worst case, the compiler has to read from x again every single time, because by writing to o , the value in x could be changing.

Of course, this code has different semantics: if you are really letting o alias either of the other pointer, it will do something different from the original.

I am going to assume you are using a recent Intel chip. ...and what I think you really want to use are the (rather limited if one is used to say a Cray:-) vector capabilities, these are called AVX. There are also libraries that will do this under C/C++, start by googling AVX and C.

Having said that, you could also tell the compiler to store some variables in registers by using the "register" keyword, see this Register keyword in C++

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM