I need to move a small 2D array of values around a much larger 2D array of values, and set any values of the larger array that are greater than the corresponding values in the smaller array to the values of the smaller array. Think image compositing, sort of, but using two 2D arrays of floats. I need to do this a ton of times as fast as possible. Just wondering if there is some way to optimize using NEON Assembly, the Accelerate framework or some other method I haven't heard of. Is anything going to be much faster than a double nested for loop to compare and replace values? For example, would it possibly be faster to store the values as a 1D array instead of a 2D array? Or faster to access the values across rows rather than down each column? Just trying to squeeze out any extra speed I can get, but not sure how.
I don't know of any functions in the Accelerate framework that will do what you want. You can definitely use NEON to accelerate it, without going directly to assembly language, using the vmin_f32
intrinsic to process two pairs of floats at a time, or using vminq_f32
to process four pairs at a time.
These links might help get you started using the intrinsics, but I don't really have any better advice for you:
How to use the multiply and accumulate intrinsics in ARM Cortex-a8?
ARM Information Center - NEON Intrinsics
ARM NEON Optimization. An Example
I found those by googling neon intrinsics tutorial
.
Also, the developer tools package includes some ARM architecture documentation:
Xcode 4.2: /Developer/Library/PrivateFrameworks/DTISAReferenceGuide.framework/Versions/A/Resources/ARMISA.pdf
Xcode 4.3: /Applications/Xcode.app/Contents/Applications/Instruments.app/Contents/Frameworks/DTISAReferenceGuide.framework/Versions/A/Resources/ARMISA.pdf
If you need to compare one-dimension C arrays of structs
, you can try memcmp()
to see if it's more efficient than a for
loop. If you can afford some sort of array hash, you may dramatically improve performance for cases where the arrays differ. For example, if you have an array of floats, you can use their sum as hash. If the arrays' hashes differ, you don't have to compare the arrays at all. On the other hand, if you expect that the arrays are actually equal most of times, the calculation of the hash will only slow things down.
Being creative with hash calculation may help, too. In case of 2D arrays the hash may be a polynomial of 1D array hashes or even a struct
with metadata like array sizes, a hash of 1D array hashes, etc.
EDIT: on my machine memcmp()
is about 2 times faster than a straight-forward single-thread for
loop when comparing large arrays of floats in the worst-case scenario (when the arrays are equal).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.