简体   繁体   中英

writing assembly code in c++

I have the following code in C++:

inline void armMultiply(const float* __restrict__ src1,
                        const float* __restrict__ src2,
                        float* __restrict__ dst)
{
    __asm volatile(
                 "vld1.f32 {q0}, [%[src1]:128]!      \n\t"
                 :
                 :[dst] "r" (dst), [src1] "r" (src1), [src2] "r" (src2)
                 );
}

Why do I get the error vector register expected ?

You're getting this error because your inline assembly is for 32 bit arm, but you're compiling for 64 bit arm (with clang - with gcc you would have gotten a different error).

(Inline) assembly is different between 32 and 64 bit arm, so you need to guard it with eg #if defined(__ARM_NEON__) && !defined(__aarch64__) , or if you want to have different assembly for both 64 and 32 bit: #ifdef __aarch64__ .. #elif defined(__ARM_NEON__) , etc.

As others commented, unless you really need to manually handtune the produced assembly, intrinsics can be just as good (and in some cases, better than what you produce yourself). You can eg do the two vld1_f32 calls, one vmul_f32 and one vst1_f32 via intrinsics just fine.

EDIT:

The corresponding inline assembly line for loading into a SIMD register on 64 bit would be:

"ld1 {v0.4s}, [%[src1]], #16      \n\t"

To support both, your function could look like this instead:

inline void armMultiply(const float* __restrict__ src1,
                        const float* __restrict__ src2,
                        float* __restrict__ dst)
{
#ifdef __aarch64__
    __asm volatile(
                 "ld1 {v0.4s}, [%[src1]], #16      \n\t"
                 :
                 :[dst] "r" (dst), [src1] "r" (src1), [src2] "r" (src2)
                 );
#elif defined(__ARM_NEON__)
    __asm volatile(
                 "vld1.f32 {q0}, [%[src1]:128]!      \n\t"
                 :
                 :[dst] "r" (dst), [src1] "r" (src1), [src2] "r" (src2)
                 );
#else
#error this requires neon
#endif
}

Assuming we're talking about GCC, the docs say that you should be using "w" ("Floating point or SIMD vector register") instead of "r" ("register operand is allowed provided that it is in a general register") as the constraint.

https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/Machine-Constraints.html#Machine-Constraints

https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/Simple-Constraints.html#Simple-Constraints

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM