The code below multiplies arrays arr1
with arr2
using SSE and puts result to arr3
. Arrays have count
elements. arr1
, arr2
and arr3
have type float*
. The problem is that the compiler doesn't support Intel syntax. How does this code look like in AT&T syntax?
Compiler is GCC 4.4.7.
__asm__ volatile (
".intel_syntax noprefix \n\t"
"loop: \n\t"
"movups xmm0, [eax+edx] \n\t"
"movups xmm1, [ebx+edx] \n\t"
"mulps xmm0, xmm1 \n\t"
"movups [ecx+edx], xmm0 \n\t"
"sub edx, 16 \n\t"
"jnz loop \n\t"
:
: "a"(arr1), "b"(arr2), "c"(arr3), "d"(count)
: "xmm0", "xmm1"
);
__asm__ volatile (
"loop: \n\t"
"subq $0x10, %%rdx \n\t"
"movups (%%rax,%%rdx), %%xmm0 \n\t"
"movups (%%rbx,%%rdx), %%xmm1 \n\t"
"mulps %%xmm1, %%xmm0 \n\t"
"movups %%xmm0, (%%rcx,%%rdx) \n\t"
"jnz loop \n\t"
:
: "a"(arr1), "b"(arr2), "c"(arr3), "d"(count)
: "xmm0", "xmm1"
);
arr1
, arr2
and arr3
are 8-byte pointers and count is 8-byte integer, so registers are r_x
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.