What functions can I use in Accelerate.framework
to scale a vector by a scalar, and normalize a vector? I found one I think might work for scaling in the documentation but I am confused about it's operation.
vDSP_vsma
Vector scalar multiply and vector add; single precision.
void vDSP_vsma (
const float *__vDSP_A,
vDSP_Stride __vDSP_I,
const float *__vDSP_B,
const float *__vDSP_C,
vDSP_Stride __vDSP_K,
float *__vDSP_D,
vDSP_Stride __vDSP_L,
vDSP_Length __vDSP_N
);
The easiest way to normalize a vector in-place is something like
int n = 3;
float v[3] = {1, 2, 3};
cblas_sscal(n, 1.0 / cblas_snrm2(n, v, 1), v, 1);
You'll need to
#include <cblas.h>
or
#include <vblas.h>
(or both). Note that several of the functions are in the "matrix" section when they operate on vectors.
If you want to use the vDSP functions, see the Vector-Scalar Division section. There are several things you can do:
vDSP_dotpr()
, sqrt()
, and vDSP_vsdiv()
vDSP_dotpr()
, vrsqrte_f32()
, and vDSP_vsmul()
( vrsqrte_f32()
is a NEON GCC built-in, though, so you need to check you're compiling for armv7). vDSP_rmsqv()
, multiply by sqrt(n)
, and vDSP_vsdiv()
The reason why there isn't a vector-normalization function is because the "vector" in vDSP means "lots of things at once" (up to around 4096
/ 8192
) and necessarily the "vector" from linear algebra. It's pretty meaningless to normalize a 1024
-element vector, and a quick function for normalizing a 3
-element vector isn't something that will make your app significantly faster, which is why there isn't one.
The intended usage of vDSP is more like normalizing 1024
2
- or 3
-element vectors. I can spot a handful of ways to do this:
vDSP_vdist()
to get a vector of lengths, followed by vDSP_vdiv()
. You have to use vDSP_vdist()
multiple times for vectors of length greater than 2, though. vDSP_vsq()
to square all the inputs, vDSP_vadd()
multiple times to add all of them, the equivalent of vDSP_vsqrt()
or vDSP_vrsqrt()
, and vDSP_vmul()
or vDSP_vdiv()
as appropriate. It shouldn't be too hard to write the equivalent of vDSP_vsqrt()
or vDSP_vrsqrt()
. Of course, if you don't have 1024 vectors to normalize, don't overcomplicate things.
Notes:
32K
for around a decade or more (they may be shared between virtual cores in a hyperthreaded CPU and some older/cheaper processors might have 16K), so the most you should do is around 8192
for in-place operation on floats. You might want to subtract a little for stack space, and if you're doing several sequential operations you probably want to keep it all in cache; 1024
or 2048
seem pretty sensible and any more will probably hit diminishing returns. If you care, measure performance...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.