I'm trying to implement multi-precision multiplication of GMP mpz_t
objects in inline X86 Assembly. Depending on my choice of constraints on the output variable, I either get a segmentation fault, or the values in the output variable get corrupted in an inconsistent way ( ie different runs of the code cause the values to get corrupted differently ).
What this code does is take two GMP mpz_t
objects, ain
and bin
, that are each guaranteed to have size 13 (ie _mp_size
is set to 13, the objects are defined by 13, 64 bit numbers) and then produce an mpz_t
object of size 26, res, that is the result of multiplying ain
and bin
together. The reason I do not use mpz_mul
is because this method usually results in a performance increase in this particular setting.
Note that res->_mp_d, ain->_mp_d
and bin->_mp_d
refer to the array of "limbs" that define the respective mpz_t
objects, with (obj->_mp_d)[0]
being the least significant limb and (obj->_mp_d)[obj->_mp_size-1]
being the most significant limb.
If anyone can help explain what I am doing wrong here, I would really appreciate it! Below is a code segment. I have excluded most the assembly because it is repetitive, but I think I give enough to give a good indication of what is going on:
void mpz_mul_x86_1(mpz_t res, mpz_t ain, mpz_t bin){
if( res->_mp_alloc<26) //the next few lines makes sure res is large enough
_mpz_realloc(res,26); //the result of the multiplication
res->_mp_size = 26;
asm volatile (
"movq 0(%1), %%rax;"
"mulq 0(%2);"
"movq %%rax, 0(%0);"
"movq %%rdx, %%r8;" //A0*B0
//0
"xorq %%r10, %%r10;"
"movq 8(%1), %%rax;"
"mulq 0(%2);"
"addq %%rax, %%r8;"
"movq %%rdx, %%r9;"
"adcq $0, %%r9;" //A1*B0
"movq 0(%1), %%rax;"
"mulq 8(%2);"
"addq %%rax, %%r8;"
"movq %%r8, 8(%0);"
"adcq %%rdx,%%r9;"
"adcq $0, %%r10;" //A0*B1
//1
"xorq %%r8, %%r8;"
"movq 0(%1), %%rax;"
"mulq 16(%2);"
"addq %%rax, %%r9;"
"adcq %%rdx, %%r10;"
"adcq $0, %%r8;" //A0*B2
"movq 8(%1), %%rax;"
"mulq 8(%2);"
"addq %%rax, %%r9;"
"adcq %%rdx, %%r10;"
"adcq $0, %%r8;" //A1*B1
"movq 16(%1), %%rax;"
"mulq 0(%2);"
"addq %%rax, %%r9;"
"movq %%r9, 16(%0);"
"adcq %%rdx, %%r10;"
"adcq $0, %%r8;" //A2*B0
//2
"xorq %%r9, %%r9;"
"movq 24(%1), %%rax;"
"mulq 0(%2);"
"addq %%rax, %%r10;"
"adcq %%rdx, %%r8;"
"adcq $0, %%r9;" //A3*B0
"movq 0(%1), %%rax;"
"mulq 24(%2);"
"addq %%rax, %%r10;"
"adcq %%rdx, %%r8;"
"adcq $0, %%r9;" //A0*B3
"movq 16(%1), %%rax;"
"mulq 8(%2);"
"addq %%rax, %%r10;"
"adcq %%rdx, %%r8;"
"adcq $0, %%r9;" //A2*B1
"movq 8(%1), %%rax;"
"mulq 16(%2);"
"addq %%rax, %%r10;"
"movq %%r10, 24(%0);"
"adcq %%rdx, %%r8;"
"adcq $0, %%r9;" //A1*B2
//3
/*About 1000 lines of omitted Assembly code is from here*/
"xor %%r8, %%r8;"
"movq 96(%1), %%rax;"
"mulq 88(%2);"
"addq %%rax, %%r9;"
"adcq %%rdx, %%r10;"
"adcq $0, %%r8;" //A12*B11
"movq 88(%1), %%rax;"
"mulq 96(%2);"
"addq %%rax, %%r9;"
"movq %%r9, 184(%0);"
"adcq %%rdx, %%r10;"
"adcq $0, %%r8;" //A11*B12
//23
"xor %%r9, %%r9;"
"movq 96(%1), %%rax;"
"mulq 96(%2);"
"addq %%rax, %%r10;"
"movq %%r10, 192(%0);"
"adcq %%rdx, %%r8;"
"adcq $0, %%r8;" //A12*B12
//24
"movq %%r8, 200(%0);" //25
: "=&r" (res->_mp_d)
: "r" ((ain->_mp_d)), "r" ((bin->_mp_d))
: "%rax", "%rdx", "%r8", "%r9", "%r10", "memory", "cc"
);
}
您实际上错误地声明res-> _ mp_d是asm语句的输出,而实际上它是指向该输出的指针的输入。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.