Is there a way to speed up the following code snippet? This is a function that accepts lidar points and converts it to Range View image Any suggestions would be appreciated. I tried using numba but didn't get much improvement.
def lidar_rv_projection(points, proj_H=32, proj_W=2048, proj_fov_up=10, proj_fov_down=-30.0):
v_fov_up = proj_fov_up / 180.0 * np.pi
v_fov_down = proj_fov_down / 180.0 * np.pi
v_fov_total = abs(v_fov_down) + abs(v_fov_up)
depth = np.linalg.norm(points[:, :3], 2, axis=1)
x_points = points[:, 0]
y_points = points[:, 1]
z_points = points[:, 2]
x_img = np.arctan2(y_points, x_points) * -1
y_img = np.arcsin(z_points / depth)
proj_x = 0.5 * (x_img / np.pi + 1.0)
proj_y = 1.0 + (y_img + abs(v_fov_down)) * -1 / v_fov_total
proj_x *= proj_W
proj_y *= proj_H
proj_x = np.floor(proj_x)
proj_x = np.minimum(proj_W - 1, proj_x)
proj_x = np.maximum(0, proj_x).astype(np.int32) # in [0,W-1]
proj_y = np.floor(proj_y)
proj_y = np.minimum(proj_H - 1, proj_y)
proj_y = np.maximum(0, proj_y).astype(np.int32) # in [0,H-1]
order = np.argsort(depth)[::-1]
depth = depth[order]
points = points[order]
proj_y = proj_y[order]
proj_x = proj_x[order]
proj_rv_img = np.full((4, proj_H, proj_W), -1,dtype=np.float64)
proj_rv_img[0, proj_y, proj_x] = depth # range
proj_rv_img[1, proj_y, proj_x] = points[:, 2] # height z
proj_rv_img[2, proj_y, proj_x] = points[:, 3] # intensity r
proj_rv_img[3, proj_y, proj_x] = 1 # binary mask
return proj_rv_img, proj_x, proj_y, points
If you're considering other numerical packages, using torch
or tensorflow
(especially if you have access to a GPU) may help substantially. Fortunately, most torch
and tensorflow
functions are implemented in a similar way to numpy
, so you probably wouldn't have to change too many functions.
Switching to torch
or tensorflow
will likely help with speeding up these operations if your dataset is considerably large, eg > 5000 points (and, since you're using lidar returns, I'm guessing somewhere between 3 and 6 dimensions).
Additionally, I noticed that you cast some variables to np.int32
and np.float32
. If it is possible to cast this to a condensed representation, this may help as well. Eg
np.int16
: Limited to [-32768, 32767]. 16-bit signed representation. np.uint16
: Limited to [0, 65535]. 16-bit unsigned representation. np.int8
: Limited to [-128, 127]. 8-bit signed representation. np.uint8
: Limited to [0, 255]. 8-bit unsigned representation You can also try reducing the precision of your floating-point arithmetic, eg in half-precision ( np.float16
) rather than single-precision ( np.float32
). Note that this will lead to an increase in errors, though the degree depends on the scale of the data and the operations performed. For more information on NumPy data types, please see this link here .
I assume you have a huge input dataset.
If you have a Nvidia GPU , the magic solution is to use the package cupy
to do the work on the GPU if you have one. You do not have to change any line of code: just add import cupy as np
and your code should be magically much faster. On my machine (with a mid-range GPU & good CPU), with an input of 5,000,000 points, this solution is 17 times faster !
If you have a AMD/Intel/Nvidia GPU , you can try the package clpy
(which use OpenCL rather than CUDA).
If you do not have a (powerful) GPU, then you can use the pnumpy
package which will make your code slightly faster (by mainly speeding up the sort which is quite slow).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.