简体   繁体   English

为ARM优化库

[英]Optimize the library for ARM

I am writing an application for Android, which uses a library SoX. 我正在为Android编写一个应用程序,它使用了一个库SoX。 This library is very strong ARM processor loads. 该库具有非常强的ARM处理器负载。 Prompt please: where I can read about how to optimize a library for ARM. 请提示:在这里我可以阅读到有关如何为ARM优化库的信息。 Can someone help? 有人可以帮忙吗?

I've been optimizing codes in assembly for quite some time starting with the MC68000 on Amiga, then mainly ARM9E (ARMv5E). 从Amiga的MC68000开始,主要是ARM9E(ARMv5E),我一直在优化汇编代码。 ARM11 was fine with the new SIMD like instructions and saturations. ARM11很好地使用了新的SIMD,例如指令和饱和度。 Then came Coretex. 然后是Coretex。

You know what? 你知道吗? NEON that came bundled with the Coretex-A series took away the whole motivation optimizing for ARM from me. 与Coretex-A系列捆绑销售的NEON摆脱了我为ARM优化的全部动力。

Unoptimized NEON codes out of box run roughly 5X faster than assembly optimized ARM codes, and it's so much easier than ARM itself : where I had to struggle hard to get things work, NEON almost always has fitting instructions doing exactly the same or even more accurate on multiple elements at once. 开箱即用的未经优化的NEON代码运行速度比装配优化的ARM代码大约快5倍,并且它比ARM本身更容易:我必须努力工作才能让事情变得有效,NEON几乎总是有完全相同的指令,甚至更准确一次在多个元素上。

I read that the ARM instruction timings changed much from Coretex in addition to the dual-issue capability which means I have to do many things differently than on ARM9 for maximum performance, but honestly, I don't care anymore. 我了解到,除了具有双重发布功能以外,ARM指令时序与Coretex相比也发生了很大变化,这意味着我必须与在ARM9上做许多不同的事情才能获得最佳性能,但是说实话,我不再关心。 NEON is the way to go. NEON是必经之路。

bye-bye ARM 再见ARM

Don't waste your time on ARM - and especially NEON intrinsics. 不要浪费时间在ARM上,尤其是NEON内部函数。 Start studying NEON instead. 开始学习NEON。

An excellent introduction to NEON : http://bit.ly/8XzPXM NEON的优秀介绍: http//bit.ly/8XzPXM

You haven't specified your target hardware. 您尚未指定目标硬件。 Android devices range from low end ARMv5E processors up to the latest Tegra3. Android设备的范围从低端ARMv5E处理器到最新的Tegra3。 If you want your code to run well on the largest variety of devices, then you will need to support ARMv5 (which doesn't have NEON). 如果您希望代码在各种各样的设备上运行良好,那么您将需要支持ARMv5(没有NEON)。 Even the Tegra2 (currently the most popular CPU for Android tablets) is missing NEON support. 即使是Tegra2(目前最流行的Android平板电脑CPU)也缺少NEON支持。 You can address this issue in Android with a "Fat binary" which contains both ARMv5 and ARMv7 code in a single APK. 您可以在Android中使用“Fat binary”解决此问题,其中包含单个APK中的ARMv5和ARMv7代码。 Some general rules about optimizing ARM code: 有关优化ARM代码的一些通用规则:

1) ARMv5/ARMv6 processors have tiny caches - optimize your data set to fit in the smallest space and re-use buffers instead of constantly allocating/freeing them to avoid evicting them from the cache 1)ARMv5 / ARMv6处理器的缓存很小-优化数据集以适合最小的空间并重新使用缓冲区,而不是不断分配/释放它们以避免将其从缓存中逐出

2) ARMv5/ARMv6 processors have only 4 write buffers. 2)ARMv5 / ARMv6处理器只有4个写缓冲区。 This means that in tight loops, writing bytes or shorts will run at about half the speed of writing longs due to tying up the write buffers. 这意味着在紧密循环中,由于占用写缓冲区,写入字节或短路将以写入long的速度的大约一半的速度运行。

3) For memory-bound data processing loops, prefetch the cache (PLD instruction). 3)对于内存绑定的数据处理循环,请预取缓存(PLD指令)。 It can generally speed things up another 20-25%. 通常它可以使速度提高20-25%。

4) For code which manipulates bits/bytes, writing in ASM is usually a good idea since higher level languages don't do a great job of working with that type of data. 4)对于处理位/​​字节的代码,用ASM编写通常是一个好主意,因为高级语言在处理这种类型的数据方面做得不好。

LB

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM