简体   繁体   中英

Speeding up numpy

Is there a way to speed up the following code snippet? This is a function that accepts lidar points and converts it to Range View image Any suggestions would be appreciated. I tried using numba but didn't get much improvement.

def lidar_rv_projection(points, proj_H=32, proj_W=2048, proj_fov_up=10, proj_fov_down=-30.0):

    v_fov_up = proj_fov_up / 180.0 * np.pi  
    v_fov_down = proj_fov_down / 180.0 * np.pi  
    v_fov_total = abs(v_fov_down) + abs(v_fov_up)  

    depth = np.linalg.norm(points[:, :3], 2, axis=1)

    x_points = points[:, 0]
    y_points = points[:, 1]
    z_points = points[:, 2]

    x_img = np.arctan2(y_points, x_points) * -1
    y_img = np.arcsin(z_points / depth)

    proj_x = 0.5 * (x_img / np.pi + 1.0)
    proj_y = 1.0 + (y_img + abs(v_fov_down)) * -1 / v_fov_total

    proj_x *= proj_W  
    proj_y *= proj_H  

    proj_x = np.floor(proj_x)
    proj_x = np.minimum(proj_W - 1, proj_x)
    proj_x = np.maximum(0, proj_x).astype(np.int32)  # in [0,W-1]

    proj_y = np.floor(proj_y)
    proj_y = np.minimum(proj_H - 1, proj_y)
    proj_y = np.maximum(0, proj_y).astype(np.int32)  # in [0,H-1]

    order = np.argsort(depth)[::-1]
    depth = depth[order]
    points = points[order]
    proj_y = proj_y[order]
    proj_x = proj_x[order]
    
    proj_rv_img = np.full((4, proj_H, proj_W), -1,dtype=np.float64)
    proj_rv_img[0, proj_y, proj_x] = depth  # range
    proj_rv_img[1, proj_y, proj_x] = points[:, 2]  # height z
    proj_rv_img[2, proj_y, proj_x] = points[:, 3]  # intensity r
    proj_rv_img[3, proj_y, proj_x] = 1  # binary mask

    return proj_rv_img, proj_x, proj_y, points

If you're considering other numerical packages, using torch or tensorflow (especially if you have access to a GPU) may help substantially. Fortunately, most torch and tensorflow functions are implemented in a similar way to numpy , so you probably wouldn't have to change too many functions.

Switching to torch or tensorflow will likely help with speeding up these operations if your dataset is considerably large, eg > 5000 points (and, since you're using lidar returns, I'm guessing somewhere between 3 and 6 dimensions).

Additionally, I noticed that you cast some variables to np.int32 and np.float32 . If it is possible to cast this to a condensed representation, this may help as well. Eg

  1. np.int16 : Limited to [-32768, 32767]. 16-bit signed representation.
  2. np.uint16 : Limited to [0, 65535]. 16-bit unsigned representation.
  3. np.int8 : Limited to [-128, 127]. 8-bit signed representation.
  4. np.uint8 : Limited to [0, 255]. 8-bit unsigned representation

You can also try reducing the precision of your floating-point arithmetic, eg in half-precision ( np.float16 ) rather than single-precision ( np.float32 ). Note that this will lead to an increase in errors, though the degree depends on the scale of the data and the operations performed. For more information on NumPy data types, please see this link here .

I assume you have a huge input dataset.

If you have a Nvidia GPU , the magic solution is to use the package cupy to do the work on the GPU if you have one. You do not have to change any line of code: just add import cupy as np and your code should be magically much faster. On my machine (with a mid-range GPU & good CPU), with an input of 5,000,000 points, this solution is 17 times faster !

If you have a AMD/Intel/Nvidia GPU , you can try the package clpy (which use OpenCL rather than CUDA).

If you do not have a (powerful) GPU, then you can use the pnumpy package which will make your code slightly faster (by mainly speeding up the sort which is quite slow).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM