简体   繁体   中英

Improve speed of passing data from Python to C(++) via ctypes

I need to optimize a function call that is in a loop, for a time-critical robotics application. My script is in python, which interfaces via ctypes with a C++ library I wrote, which then calls a microcontroller library.

The bottleneck is adding position-velocity-time points to the microcontroller buffer. According to my timing checks, calling the C++ function via ctypes takes about 0.45 seconds and on the C++ side the called function takes 0.17 seconds. I'm need to reduce this difference somehow.

Here is the relevant python code, where data is a 2D array of points and clibrary is loaded via ctypes:

data_np = np.vstack([nodes, positions, velocities, times]).transpose().astype(np.long)

data = ((c_long * 4) * N)()
for i in range(N):
    data[i] = (c_long * 4)(*data_np[i])

timer = time()
clibrary.addPvtAll(N, data)
print("clibrary.addPvtAll() call: %f" % (time() - timer))

And here is the called C++ function:

void addPvtAll(int N, long data[][4]) {

    clock_t t0, t1;
    t0 = clock();

    for(int i = 0; i < N; i++) {
        unsigned short node = (unsigned short)data[i][0];
        long p = data[i][1];
        long v = data[i][2];
        unsigned char t = (unsigned char)data[i][3];

        VCS_AddPvtValueToIpmBuffer(device(node), node, p, v, t, &errorCode);
    }

    t1 = clock();
    printf("addPvtAll() call: %f \n", (double(t1 - t0) / CLOCKS_PER_SEC));
}

I don't absolutely need to use ctypes but I don't want to have to compile the Python code every time I run it.

The round-trip between Python and C++ can be expensive, especially when using ctypes (which is like an interpreted version of a normal C/Python wrapper).

Your goal should be to minimize the number of trips and do the most work possible per trip.

It looks to me like your code has too fine of a granularity (ie doing too many trips and doing too little work on each trip).

The numpy package can expose its data directly to C/C++. That will let you avoid the expensive boxing and unboxing of Python objects (with their attendant memory allocations) and it will let you pass a range of data points rather than a point at a time.

Modify your C++ code to process many points at a time rather than once per call (much like the sqlite3 module does with execute vs. executemany ).

Here is my solution, which effectively eliminates the measured time difference between Python and C. Credit to kirbyfan64sos for suggesting SWIG and Raymond Hettinger for C-arrays in numpy. I use a numpy array in Python which is sent to C purely as a pointer - the same memory block is accessed in both languages.

The C function remains identical except using gettimeofday() instead of clock() , which was giving inaccurate times:

void addPvtFrame(int pvt[6][4]) {

    timeval start,stop,result;
    gettimeofday(&start, NULL);

    for(int i = 0; i < 6; i++) {
        unsigned short node = (unsigned short)pvt[i][0];
        long p = (long)pvt[i][1];
        long v = (long)pvt[i][2];
        unsigned char t = (unsigned char)pvt[i][3];

        VCS_AddPvtValueToIpmBuffer(device(node), node, p, v, t, &errorCode);
    }

    gettimeofday(&stop, NULL);
    timersub(&start,&stop,&result);
    printf("Add PVT time in C code: %fs\n", -(result.tv_sec + result.tv_usec/1000000.0));
}

In addition, I installed SWIG and included the following in my interfaces file:

%include "numpy.i"
%init %{
    import_array();
%}

%apply ( int INPLACE_ARRAY2[ANY][ANY] ) {(int pvt[6][4])}

Finally, my Python code constructs pvt as a contiguous array via numpy:

pvt = np.vstack([nodes, positions, velocities, times])
pvt = np.ascontiguousarray(pvt.transpose().astype(int))

timer = time()
xjus.addPvtFrame(pvt)
print("Add PVT time to C code: %fs" % (time() - timer))

The measured times now have about %1 difference on my machine.

You can just use data_np.data.tobytes() :

data_np = np.vstack([nodes, positions, velocities, times]).transpose().astype(np.long)
timer = time()
clibrary.addPvtAll(N, data_np.data.tobytes())
print("clibrary.addPvtAll() call: %f" % (time() - timer))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM