简体   繁体   中英

How to (de-)serialize PyObject* in C(++)?

I'm currently working on a multi-threaded python module in C(++). I'm almost done, but one of the last things I need to do is to find a way around the GIL, so that communication between threads becomes possible.

To do this, I wish to attempt the following pseudo code:

// Called from Python
Pyobject* send_data(data, procid) {
    // Change the Python object to byte data and 
    // store it outside of python's memory management.
    serialized = serialize(data);

    // Send the byte data to desired processor
    // (Stored in a queue on that processor)
    send(serialize, procid);
}

// Called from python
Pyobject* receive_data() {
    // Grab data from queue
    serialized = grab_data();

    // De-serialize data
    data = de_serialize(serialized);

    return data;
}

The reason I wish to serialize the data before sending is due to the fact that the memory size of the sent data has to be known. Since sizeof(PyObject*) = 8 , this is my attempt to ensure the size is always correct.

Now, I found a way to serialize the data in python using pickle , but don't know how I could transfer this to C in a manner that is computationally acceptable. (So without calling a function which starts a python instance, imports the right library and sends the pickle function as a callable to C.)

Any help in achieving this would be hugely appreciated!

Of course, if you know how to get accurate size data from the PyObjects and know how to clone them to C, that would also be great! ^_^'

As you're after handling serialised data in two separate languages, how about a language-agnostic serialisation standard? This means a schema first approach.

This is the best approach to avoid writing every data structure definition twice, once in C and once in Python, particularly useful if you have complicated data structures.

For both Python and C, the choices are a bit thin. There's Google Protocol Buffers here and C version here , Apache Avro here will probably work too, and ASN.1 (if you're feeling brave overview , recommended reference , Playground , Commercial , Commercial , and take a look on GitHub for some free ones).

The general approach with all of these is to generate Python, C (or C#, Java, C++) source code from a schema. The source code defines data structures, and the functions / methods required to serialise / deserialise those to a common wireformat. The output / input to these functions is an array of bytes, so their content is not going to involve the GIL when used on the C side.

With mature tools that work properly, this is a very liberating way of exchanging data; you can mix languages in a system as required. ASN.1 in particular is very good, as its constraints system allows one to be very specific about what is, and what is not, valid data. Strong interfaces! Google Protocol Buffers is almost perfect (because it's free and does almost everything), but has no constraints.

If you have only simple data structures, the overhead of writing every structure definition twice might not be so bad. So any decent Pickle library for C might do nicely, so long as they don't just build a Python object from the Pickle.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM