I have a C application using Intel intrinsics like:
__m128 _mm_add_ps (__m128 a, __m128 b)
__m128 _mm_sub_ps (__m128 a, __m128 b)
__m128 _mm_mul_ps (__m128 a, __m128 b)
__m128 _mm_set_ps (float e3, float e2, float e1, float e0)
void _mm_store_ps (float* mem_addr, __m128 a)
__m128 _mm_load_ps (float const* mem_addr)
Now, i am trying to modify my application in order to make it work on ARMv8 using a simulator called Gem5 . So, i began to look around for ARM intrinsics and i found this manual ARM® NEON™ Intrinsics Reference
Well, i found the arithmetic intrinsics, but I'm a little bit lost with setting, storing and loading instructions.
Anyone with experience with ARM intrinsics could tell me the right intrinsics?
Here are a few equivalents to get you started:
SSE ARM
__m128 float32x4_t // 4 x 32 bits floats in a vector
_mm_load_ps vld1q_f32 // load float vector from memory
_mm_store_ps vst1q_f32 // store float vector to memory
_mm_add_ps vaddq_f32 // add float vectors
As for initialising a vector, as you might with eg _mm_set_ps
in SSE, compilers such as gcc and clang allow you to this in a slightly more C-like way with Neon data types, eg
const float32x4_t v = { 1.0f, 2.0f, 3.0f, 4.0f };
However if your compiler does not support this method then you may have have to use equivalent Neon intrinsics.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.