简体   繁体   中英

pybind11 speeding up function calls

I have several C++ function objects that I construct in Python using pybind11 and then pass these objects from Python to another C++ function which calls them. Since these functions have state, they do not go through the pybind11 optimization for stateless python functions and the performance is very slow.

I can workaround this with an ugly hack that returns the pointer of the created C++ object to Python, which then passes the pointer back to the caller C++ function. However, I was hoping there was a cleaner, more maintainable way to do this.

Here is some code that replicates this (import_call_execute embeds a Python process and runs it) based on: https://pythonextensionpatterns.readthedocs.io/en/latest/debugging/debug_in_ide.html

The first python program below takes 163 millisecs on my machine and the second takes only 0.5 milliseconds

#include <pybind11/pybind11.h>
#include <pybind11/functional.h>
#include <iostream>
#include <chrono>

#include "py_import_call_execute.hpp"

using namespace std;
using namespace std::chrono;
using namespace pybind11::literals;

namespace py = pybind11;

class TestFunc {
public:
    TestFunc(int a): _a(a) {}

    int operator()(int b) const {
        return _a + b;
    }

    size_t get_ptr() {
        return (size_t)this;
    }
private:
    int _a;
};

int test_dummy_function(const std::function<int(int)> &f) {
    auto start = high_resolution_clock::now();

    int sum = 0;
    for (int i = 0; i < 100000; ++i) {
        sum += f(i);
    }
    auto stop = high_resolution_clock::now();
    auto duration = duration_cast<microseconds>(stop - start);

    cout << "sum: " << sum << " time: " << duration.count() / 1000.0 << " milliseconds" << endl;

    return sum;
}

int test_dummy_function2(std::size_t ptr) {
    auto start = high_resolution_clock::now();

    TestFunc* f = reinterpret_cast<TestFunc*>(ptr);

    int sum = 0;
    for (int i = 0; i < 100000; ++i) {
        sum += (*f)(i);
    }
    auto stop = high_resolution_clock::now();
    auto duration = duration_cast<microseconds>(stop - start);

    cout << "sum: " << sum << " time: " << duration.count() / 1000.0 << " milliseconds" << endl;

    return sum;
}

PYBIND11_MODULE(pybind_testing, m) {
    py::class_<TestFunc>(m, "TestFunc")
    .def(py::init<int>(), "a"_a)
    .def("__call__", &TestFunc::operator(), "b"_a = 3)
    .def("get_ptr", &TestFunc::get_ptr);

    m.def("test_dummy_function", test_dummy_function);
    m.def("test_dummy_function2", test_dummy_function2);
 }

int main(int argc, const char *argv[]) {
    argc = 4;
    const char *argv2[] = {
            "python",
            "/Users/sal/Developer/coatbridge/testing/pybind11",
            "test_pybind11",
            "test_pybind11"};
    return import_call_execute(argc, argv2);
}

Python function 1:

import pybind_testing as pt

def test_pybind11():
    test_func = pt.TestFunc(2)
    pt.test_dummy_function(test_func)

Python function 2:

import pybind_testing as pt

def test_pybind11():
    test_func = pt.TestFunc(2)
    pt.test_dummy_function2(test_func.get_ptr())

The poor performance has nothing to do with pybind11 or Python. It's slow because you're using std::function , which is nothing like a regular function call.

You can see this by replacing the code in main() with this:

TestFunc test_func(2);
test_dummy_function(test_func);
test_dummy_function2(test_func.get_ptr());

To fix it, simply stop using std::function . You can pass the TestFunc object directly by reference or (smart?) pointer. There should be no need for the hack of casting its address to size_t and back again (though note that if you do need to do that, the correct type is uintptr_t not size_t ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM