简体   繁体   中英

Detect ARMv8 A53 vs A57 architecture at runtime?

I'm benchmarking a library against ARMv8 machines. I have four Cortex-A53 dev-boards, and our NEON intrinsics implementation outperforms the C/C++ implementation by about 30%. This is expected.

The GCC compile farm offers a Softiron Overdrive 1000. Its a Cortex-A57 server board, and the C/C++ code outperforms the intrinsics implementation by a factor of 50%. This was surprising.

We'd like to use our NEON implementation for A-53, but use the C/C++ implementation for the A57. We have code that can make runtime feature selections, like HasNEON() , HasCRC() , HasAES() and HasSHA() . We don't have anything for the architecture, like A53 vs A57.

My question is, how do we detect an A53 vs A57 at runtime?


We have similar code for x86 code paths for the P4 processor. The P4 has some slow word operations. We detect the P4 by checking CPUID bits, but ARM systems are different. ARM systems the CPUID-like instruction is reading a MSR, and it usually requires a higher privilege level (EL1 or above).


If interested, the Cortex-A57 is slower for a particular hash algorithm because it relies heavily on shifts, rotates and xors. The A57 Optimization guide tells us shifts and rotates are more expensive. It takes 4 or 5 cycle in the ASIMD coprocessor for the shift, and only the F1 pipe can perform the operation (per section 3.14).

It could also be the Cortex-A53 has the same penalty, and its integer unit is slower so non-NEON code does not outperform the NEON code.

Have a tune() function that's called during process initialisation, that benchmarks your implementation and GCC's implementation and caches the result (eg in a bool isMyImplementationFaster global variable).

If your implementation is faster you could assume it's an A53 (and if it's slower you could assume it's an A57). Note that this causes a problem/confusion for CPUs (including future CPUs) that are neither A53 nor A57. However; I'm hoping you'll realise that you don't actually care if it's A53 or A57 (or something else), and that you only care if your implementation is faster/slower.

In general, as you and others have noted, the real cpuid-like instructions aren't available from user mode code. In practice, the relevant info is handled in platform specific ways.

On linux, you can try parsing /proc/cpuinfo, if available/readable. The CPU implementer/architecture/variant/part numbers should pretty well identify different CPUs. This file should IIRC also be readable on Android.

For other OSes, the OS would need to provide the necessary info somewhere, and not all of them probably do at all.

EDIT: A Cortex-A53 that I looked at has got the following info in /proc/cpuinfo :

CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03

While a Cortex-A57 has got the following:

CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x1
CPU part        : 0xd07

However, as Brendan pointed out, trying to match these is pretty futile as the number of different individual cores constantly grows.

Additionally, some SoCs have a heterogenous set of cores, see big.LITTLE . Eg Snapdragon 810 has got 4 Cortex-A53 cores and 4 Cortex-A57 cores. Your threads will be scheduled and moved across these cores as the kernel's scheduler sees fit. In that case, benchmark numbers that you got at startup might not match the cores that the code ends up scheduled at later.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM