簡體   English   中英

使用 ctypes 將 2d numpy 數組矩陣乘以 cpp

[英]Matrix multiplication of a 2d numpy array to cpp using ctypes

使用 ctype 進行矩陣乘法的正確方法是什么?

在我當前的實施數據中來回消耗大量時間,有什么辦法可以優化嗎? 通過傳遞數組地址並返回指針而不是使用.contents方法生成整個數組。

cpp_函數.cpp

使用g++ -shared -fPIC cpp_function.cpp -o cpp_function.so

#include <iostream>
extern "C" {

double* mult_matrix(double *a1, double *a2, size_t a1_h, size_t a1_w, 
                  size_t a2_h, size_t a2_w, int size)
{
    
    double* ret_arr = new double[size];
    for(size_t i = 0; i < a1_h; i++){
        for (size_t j = 0; j < a2_w; j++) {
            double val = 0;
            for (size_t k = 0; k < a2_h; k++){
                val += a1[i * a1_h + k] * a2[k * a2_h +j] ;
                
            }
            ret_arr[i * a1_h +j ] = val;
            // printf("%f ", ret_arr[i * a1_h +j ]);
        }
        // printf("\n");
    }
    return ret_arr;

   }

}

Python文件調用so文件main.py

import ctypes
import numpy
from time import time

libmatmult = ctypes.CDLL("./cpp_function.so")
ND_POINTER_1 = numpy.ctypeslib.ndpointer(dtype=numpy.float64, 
                                      ndim=2,
                                      flags="C")
ND_POINTER_2 = numpy.ctypeslib.ndpointer(dtype=numpy.float64, 
                                    ndim=2,
                                    flags="C")
libmatmult.mult_matrix.argtypes = [ND_POINTER_1, ND_POINTER_2, ctypes.c_size_t, ctypes.c_size_t]

def mult_matrix_cpp(a,b):
    shape = a.shape[0] * a.shape[1]
    libmatmult.mult_matrix.restype = ctypes.POINTER(ctypes.c_double * shape )
    ret_cpp = libmatmult.mult_matrix(a, b, *a.shape, *b.shape , a.shape[0] * a.shape[1])
    out_list_c = [i for i in ret_cpp.contents] # <---- regenrating list which is time consuming
    return out_list_c

size_a = (300,300)
size_b = size_a

a = numpy.random.uniform(low=1, high=255, size=size_a)
b = numpy.random.uniform(low=1, high=255, size=size_b)

t2 = time()
out_cpp = mult_matrix_cpp(a,b)
print("cpp time taken:{:.2f} ms".format((time() - t2) * 1000))
out_cpp = numpy.array(out_cpp).reshape(size_a[0], size_a[1])

t3 = time()
out_np = numpy.dot(a,b)
# print(out_np)
print("Numpy dot() time taken:{:.2f} ms".format((time() - t3) * 1000))

這個解決方案有效,但很耗時,有什么辦法可以讓它更快嗎?

耗時的一個原因是沒有使用ndpointer作為返回值並將其復制到 Python 列表中。 而是使用以下restype 您也不需要以后的reshape 但是接受評論者的建議,不要重新發明輪子。

def mult_matrix_cpp(a, b):
    shape = a.shape[0] * a.shape[1]
    libmatmult.mult_matrix.restype = np.ctypeslib.ndpointer(dtype=np.float64, ndim=2, shape=a.shape, flags="C")
    return libmatmult.mult_matrix(a, b, *a.shape, *b.shape , a.shape[0] * a.shape[1])

使用restype

def mult_matrix_cpp(a, b):
    shape = a.shape[0] * a.shape[1]
    libmatmult.mult_matrix.restype = np.ctypeslib.ndpointer(dtype=np.float64, ndim=2, shape=a.shape, flags="C")
    return libmatmult.mult_matrix(a, b, *a.shape, *b.shape , a.shape[0] * a.shape[1])

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM