简体   繁体   中英

Cycle count neon for M2?

Is there a resource on how many cycles SIMD is on apple M1/M2? Like x86 https://uops.info/table.html or agner fog? I wish I could give a bigger bounty but that's all the rep I have

I never programmed on a ARM machine. I took a look at sse2neon
https://github.com/DLTcollab/sse2neon/blob/7bd15eac51e36bf7426052f8515358cb665d8c04/sse2neon.h

The first thing I looked up was setzero. I was doubting that dup was the way to go so I tried nanobench and saw xor was faster, and that sub itself wasn't the same.

Is there something I can look up to get a rough idea? My target is M2

#include <arm_neon.h>
#define ANKERL_NANOBENCH_IMPLEMENT
#include "nanobench.h"

int32x4_t setzeroA()
{
    return vdupq_n_s32(0);
}
int32x4_t setzeroB()
{
    int32x4_t v;
    return vsubq_u32(v, v);
}
uint8x16_t setzeroC()
{
    uint8x16_t v;
    return veorq_u8(v, v);
}

int main() {
    ankerl::nanobench::Bench().run("Set", [&] {
        auto v = setzeroA();
        ankerl::nanobench::doNotOptimizeAway(v);
    });
    ankerl::nanobench::Bench().run("sub", [&] {
        auto v = setzeroB();
        ankerl::nanobench::doNotOptimizeAway(v);
    });
    ankerl::nanobench::Bench().run("xor", [&] {
        auto v = setzeroC();
        ankerl::nanobench::doNotOptimizeAway(v);
    });
}

These are from the M1, but I doubt anything major has changed with the M2.

Big: https://dougallj.github.io/applecpu/firestorm-simd.html

Little: https://dougallj.github.io/applecpu/icestorm-simd.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM