X86-64 Inline Assembly in C (compiled using GCC), multi-precision multiplication routine causing a seg fault

Question

I'm trying to implement multi-precision multiplication of GMP mpz_t objects in inline X86 Assembly. Depending on my choice of constraints on the output variable, I either get a segmentation fault, or the values in the output variable get corrupted in an inconsistent way ( ie different runs of the code cause the values to get corrupted differently ).

What this code does is take two GMP mpz_t objects, ain and bin , that are each guaranteed to have size 13 (ie _mp_size is set to 13, the objects are defined by 13, 64 bit numbers) and then produce an mpz_t object of size 26, res, that is the result of multiplying ain and bin together. The reason I do not use mpz_mul is because this method usually results in a performance increase in this particular setting.

Note that res->_mp_d, ain->_mp_d and bin->_mp_d refer to the array of "limbs" that define the respective mpz_t objects, with (obj->_mp_d)[0] being the least significant limb and (obj->_mp_d)[obj->_mp_size-1] being the most significant limb.

If anyone can help explain what I am doing wrong here, I would really appreciate it! Below is a code segment. I have excluded most the assembly because it is repetitive, but I think I give enough to give a good indication of what is going on:

void mpz_mul_x86_1(mpz_t res, mpz_t ain, mpz_t bin){

   if( res->_mp_alloc<26) //the next few lines makes sure res is large enough
     _mpz_realloc(res,26); //the result of the multiplication

   res->_mp_size = 26;


   asm volatile (            
     "movq 0(%1), %%rax;" 
     "mulq 0(%2);"
     "movq %%rax, 0(%0);"    
     "movq %%rdx, %%r8;"           //A0*B0
                                   //0

     "xorq %%r10, %%r10;" 

     "movq 8(%1), %%rax;"      
     "mulq 0(%2);"              
     "addq %%rax, %%r8;"     
     "movq %%rdx, %%r9;"  
     "adcq $0, %%r9;"              //A1*B0

     "movq 0(%1), %%rax;"  
     "mulq 8(%2);"         
     "addq %%rax, %%r8;" 
     "movq %%r8, 8(%0);"  
     "adcq %%rdx,%%r9;"    
     "adcq $0, %%r10;"                //A0*B1
                                     //1

     "xorq %%r8, %%r8;" 

     "movq 0(%1), %%rax;"
     "mulq 16(%2);"            
     "addq %%rax, %%r9;"            
     "adcq %%rdx, %%r10;"
     "adcq $0, %%r8;"           //A0*B2

     "movq 8(%1), %%rax;"
     "mulq 8(%2);"            
     "addq %%rax, %%r9;"            
     "adcq %%rdx, %%r10;"
     "adcq $0, %%r8;"        //A1*B1

     "movq 16(%1), %%rax;"
     "mulq 0(%2);"            
     "addq %%rax, %%r9;"    
     "movq %%r9, 16(%0);" 
     "adcq %%rdx, %%r10;"
     "adcq $0, %%r8;"            //A2*B0
                                 //2
     "xorq %%r9, %%r9;"  

     "movq 24(%1), %%rax;"
     "mulq 0(%2);"            
     "addq %%rax, %%r10;"            
     "adcq %%rdx, %%r8;"
     "adcq $0, %%r9;"              //A3*B0

     "movq 0(%1), %%rax;"
     "mulq 24(%2);"            
     "addq %%rax, %%r10;"            
     "adcq %%rdx, %%r8;"
     "adcq $0, %%r9;"            //A0*B3

     "movq 16(%1), %%rax;"
     "mulq 8(%2);"            
     "addq %%rax, %%r10;"            
     "adcq %%rdx, %%r8;"
     "adcq $0, %%r9;"        //A2*B1

     "movq 8(%1), %%rax;"
     "mulq 16(%2);"            
     "addq %%rax, %%r10;"   
     "movq %%r10, 24(%0);" 
     "adcq %%rdx, %%r8;"
     "adcq $0, %%r9;"        //A1*B2
                             //3


    /*About 1000 lines of omitted Assembly code is from here*/


     "xor %%r8, %%r8;"

     "movq 96(%1), %%rax;"
     "mulq 88(%2);"            
     "addq %%rax, %%r9;"
     "adcq %%rdx, %%r10;"
     "adcq $0, %%r8;"    //A12*B11

     "movq 88(%1), %%rax;"
     "mulq 96(%2);"            
     "addq %%rax, %%r9;"
     "movq %%r9, 184(%0);"
     "adcq %%rdx, %%r10;"
     "adcq $0, %%r8;"    //A11*B12
                         //23
     "xor %%r9, %%r9;"

     "movq 96(%1), %%rax;"
     "mulq 96(%2);"            
     "addq %%rax, %%r10;"
     "movq %%r10, 192(%0);"
     "adcq %%rdx, %%r8;"
     "adcq $0, %%r8;"    //A12*B12
                         //24

     "movq %%r8, 200(%0);" //25


     :  "=&r" (res->_mp_d) 
     : "r" ((ain->_mp_d)), "r" ((bin->_mp_d))
     : "%rax", "%rdx", "%r8", "%r9", "%r10", "memory", "cc"
     );
}

Answer 1

您实际上错误地声明res-> _ mp_d是asm语句的输出，而实际上它是指向该输出的指针的输入。

X86-64 Inline Assembly in C (compiled using GCC), multi-precision multiplication routine causing a seg fault

Question

1 answers

solution1
1 2013-08-22 19:39:53

X86-64 Inline Assembly in C (compiled using GCC), multi-precision multiplication routine causing a seg fault

Question

1 answers

solution1 1 2013-08-22 19:39:53

solution1
1 2013-08-22 19:39:53