Shared library vs. opening a process performance

Question

I have a certain base of Python code (a Flask server). I need the server to perform a performance-critical operation, which I've decided to implement in C++. However, as the C++ part has other dependencies, trying out ctypes and boost.python yielded no results (not found symbols, other libraries even when setting up the environment, etc., basically, there were problems ). I believe a suitable alternative would be for me to just compile the C++ part into an executable (a single function/procedure is required) and run it from python using commands or subprocess , communicating through stdin/out for example. The only thing I'm worried about is that this will slow down the procedure enough to matter and since I'm unable to create a shared object library, calling its function from python directly, I cannot benchmark the speedup.

When I compile the C++ code into an executable and run it with some sample data, the program takes ~5s to run. This does not account for opening the process from python, nor for passing data between the processes.

The question is: How big of a speedup can one expect by using ctypes/boost with a SO compared to creating a new process to run the procedure? If I regard the number to be big enough, it would motivate me to solve the encountered problems, basically, I'm asking if it's worth it .

Answer 1

If you're struggling with creating binding using Boost.Python, you can manually expose your API via c-functions and use them via FFI.

Here's a simple example, which briefly explains my idea. At first, you create a shared library, but add some extra functions here, which in the example I put into extern "C" section. It's necessary to use extern "C" since otherwise function names will be mangled and their actual names are likely to be different from those you've declared:

#include <cstdint>
#include <cstdio>

#ifdef __GNUC__
#define EXPORT __attribute__ ((visibility("default")))
#else // __GNUC__
#error "Unsupported compiler"
#endif // __GNUC__

class data_processor {
public:
  data_processor() = default;

  void process_data(const std::uint8_t *data, std::size_t size) {
    std::printf("processing %zu bytes of data at %p\n", size, data);
  }
};

extern "C" {
EXPORT void *create_processor() {
  return new data_processor();
}

EXPORT void free_processor(void *data) {
  delete static_cast<data_processor *>(data);
}

EXPORT void process_data(void *object, const std::uint8_t *data, const std::uint32_t size) {
  static_cast<data_processor *>(object)->process_data(data, size);
}
}

Then you create function bindings in python. As you can see function declarations are almost the same as they are in the cpp file below. I used built-in types only (like void * , uint8_t and anything, but I believe FFI allows you to declare and use custom structs as well):

from cffi import FFI

mylib_api = FFI()
mylib_api.cdef("""
    void *create_processor();
    void free_processor(void *object);
    void process_data(void *object, const uint8_t *data, uint32_t size);
""")
mylib = mylib_api.dlopen("mylib.so-file-location")

processor = mylib.create_processor()
try:
    buffer = b"buffer"
    mylib.process_data(processor, mylib_api.from_buffer("uint8_t[]", python_buffer=buffer), len(buffer))
finally:
    mylib.free_processor(processor)

And that's basically it.

In my opinion inter-processing is going to be the last resort when nothing else works since:

you need to put a lot of efforts implementing details of your communication protocol, either if you use something popular, there could be a lot of issues, especially from c++-side;
inter-process communication is generally more expensive in terms of processor time.

Shared library vs. opening a process performance

Question

1 answers

solution1
1 ACCPTED 2021-06-01 14:53:25

Shared library vs. opening a process performance

Question

1 answers

solution1 1 ACCPTED 2021-06-01 14:53:25

solution1
1 ACCPTED 2021-06-01 14:53:25