I am implementing a sequential algorithm (Kalman Filter) with a particular structure where a lot of inner looping can be done in parallel. I need to get as much performance out of this function as possible. Currently, it runs in about 600ms on my machine with representative data inputs (n, p = 12, d = 3, T = 3000)
I have used @numba.jit
with nopython=True, parallel=True
and annotated my ranges with numba.prange
. However, even with very large data inputs (n > 5000) there is clearly no parallelism occurring (based on just looking at cores with top
).
There is quite a bit of code here, I'm showing only the main chunk. Is there a reason Numba wouldn't be able to parallelize the array operations under the prange
? I have also checked numba.config.NUMBA_NUM_THREADS
(it is 8) and played with different numba.config.THREADING_LAYER
(it is currently 'tbb'
). I have also tried with both the openblas and the MKL versions of numpy+scipy, the MKL version appears to be slightly slower, and still no parallelization.
The annotation is:
@numba.jit(nopython=True, cache=False, parallel=True,
fastmath=True, nogil=True)
And the main part of the function:
P = np.empty((T + 1, n, p, d, d))
m = np.empty((T + 1, n, p, d))
P[0] = P0
m[0] = m0
phi = 0.0
Xt = np.empty((n, p)
for t in range(1, T + 1):
sum_P00 = 0.0
v = y[t - 1]
# Purely for convenience, little performance impact
for tau in range(1, p + 1):
Xt[:, tau - 1] = X[p + t - 1 - tau]
# Predict
for i in numba.prange(n):
for tau in range(p):
# Prediction step
m[t, i, tau] = Phi[i, tau] @ m[t - 1, i, tau]
P[t, i, tau] = Phi[i, tau] @ P[t - 1, i, tau] @ Phi[i, tau].T
# Auxiliary gain variables
for i in numba.prange(n):
for tau in range(p):
v = v - Xt[i, tau] * m[t, i, tau, 0]
sum_P00 = sum_P00 + P[t, i, tau, 0, 0]
# Energy function update
s = np.linalg.norm(Xt)**2 * sum_P00 + sv2
phi += np.pi * s + 0.5 * v**2 / s
# Update
for i in numba.prange(n):
for tau in range(p):
k = Xt[i, tau] * P[t, i, tau, :, 0] # Gain
m[t, i, tau] = m[t, i, tau] + (v / s) * k
P[t, i, tau] = P[t, i, tau] + (k / s) @ k.T
It appears to simply have been a problem with running interactively in Ipython. Running a test script from the console leads to parallel execution, as expected.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.