简体   繁体   中英

passing a list of strings from python to C through pybind11

Following this post , I want to know how I can pass a list of strings from Python to C (ie, using C headers and syntax, not C++), through Pybind11. I'm completely aware of the fact that Pybind11 is a C++ library and codes must be compiled by a C++ compiler anyway. However, it is difficult for me to understand the C++ implementations, for example here and here .

Here I tried to pass a python list of strings by pointers, represented as integers, and then receive them by long* in C, but it didn't work.

The C/C++ code should be something like:

// example.cpp
#include <stdio.h>
#include <stdlib.h>

#include <pybind11/pybind11.h>

int run(/*<pure C or pybind11 datatypes> args*/){

    // if pybind11 data types are used convert them to pure C :
    // int argc = length of args
    // char* argv[] =  array of pointers to the strings in args, possible malloc

    for (int i = 0; i < argc; ++i) {
        printf("%s\n", argv[i]);
    } 

    // possible free

    return 0;
}

PYBIND11_MODULE(example, m) {

    m.def("run", &run, "runs the example");
}

A simple CMakeLists.txt example is also provided here . and the Python code can be something like this:

#example.py
import example

print(example.run(["Lorem", "ipsum", "dolor", "sit", "amet"]))

To avoid misunderstandings like this , please consider these points:

  • This is not an XY question, as the presumed Y problem has already been solved in the correct/canonical way using C++ headers/standard libraries and syntax (links above). The purpose of this question is pure curiosity. Solving the problem in a syntax I'm familiar with will help me to understand the underlying nature of pybind11 data types and functionality. Please do not try to find the Y problem and solve it.
  • I'm completely aware that pybind11 is a C++ library and the code must be compiled with a C++ compiler anyway.
  • I would appreciate it if you would consult me in the comments about required edits to my question, rather than doing it yourself. I know you want to help, but I have tried to frame my question as nicely as possible to avoid confusion.
  • I would appreciate it if you would avoid changing the none commented parts of my C/C++ and python codes, as much as possible.
  • I'm aware that using the term "C/C++" is wrong. I use the term to refer to a C++ code written in C syntax and using C headers. I'm sorry that I don't know a better way to call it.
  • As the commented parts of the example.cpp file indicates, it is ok to use pybind11 datatypes and then convert them to C. But I suspect a pure C solution might also be possible. For example, seethis attempt .

Below I've reformatted the previous example code where I used C++ constructs, to only use C and pybind11 ones.

 #include <pybind11/pybind11.h> #include <stdio.h> #if PY_VERSION_HEX < 0x03000000 #define MyPyText_AsString PyString_AsString #else #define MyPyText_AsString PyUnicode_AsUTF8 #endif namespace py = pybind11; int run(py::object pyargv11) { int argc = 0; char** argv = NULL; PyObject* pyargv = pyargv11.ptr(); if (PySequence_Check(pyargv)) { Py_ssize_t sz = PySequence_Size(pyargv); argc = (int)sz; argv = (char**)malloc(sz * sizeof(char*)); for (Py_ssize_t i = 0; i < sz; ++i) { PyObject* item = PySequence_GetItem(pyargv, i); argv[i] = (char*)MyPyText_AsString(item); Py_DECREF(item); if (!argv[i] || PyErr_Occurred()) { free(argv); argv = nullptr; break; } } } if (!argv) { //fprintf(stderr, "argument is not a sequence of strings\\n"); //return; if (!PyErr_Occurred()) PyErr_SetString(PyExc_TypeError, "could not convert input to argv"); throw py::error_already_set(); } for (int i = 0; i < argc; ++i) fprintf(stderr, "%s\\n", argv[i]); free(argv); return 0; } PYBIND11_MODULE(example, m) { m.def("run", &run, "runs the example"); }

Below I will heavily comment it out to explain what I'm doing and why.

In Python2, string objects are char* based, in Python3, they are Unicode based. Hence the following macro MyPyText_AsString that changes behavior based on Python version, since we need to get to C-style "char*".

#if PY_VERSION_HEX < 0x03000000
#define MyPyText_AsString PyString_AsString
#else
#define MyPyText_AsString PyUnicode_AsUTF8
#endif

The pyargv11 py::object is a thin handle on a Python C-API handle object; since the following code makes use of the Python C-API, it's easier to deal with the underlying PyObject* directly.

void closed_func_wrap(py::object pyargv11) {
    int argc = 0;            // the length that we'll pass
    char** argv = NULL;      // array of pointers to the strings

    // convert input list to C/C++ argc/argv :

    PyObject* pyargv = pyargv11.ptr();

The code will only accept containers that implement the sequence protocol and can thus be looped over. This covers the two most important ones PyTuple and PyList at the same time (albeit a tad slower than checking for those types directly, but this will keep the code more compact). To be fully generic, this code should also check for the iterator protocol (eg for generators and probably reject str objects, but both are unlikely.

    if (PySequence_Check(pyargv)) {

Okay, we have a sequence; now get its size. (This step is the reason why for ranges you'd need to use the Python iterator protocol since their size is typically not known (although you can request a hint).)

        Py_ssize_t sz = PySequence_Size(pyargv);

One part, the size is done, store it in the variable that can be passed on to other functions.

        argc = (int)sz;

Now allocate the array of pointers to char* (technically const char* ,but that matters not here as we'll cast it away).

        argv = (char**)malloc(sz * sizeof(char*));

Next, loop over the sequence to retrieve the individual elements.

        for (Py_ssize_t i = 0; i < sz; ++i) {

This gets a single elemenent from the sequence. The GetItem call is equivalent to Pythons "[i]", or getitem call.

            PyObject* item = PySequence_GetItem(pyargv, i);

In Python2, string objects are char* based, in Python3, they are unicode based. Hence the following macro "MyPyText_AsString" that changes behavior based on Python version, since we need to get to C-style "char*".

The cast from const char* to char* here is in principle safe, but the contents of argv[i] must NOT be modified by other functions. The same is true for the argv argument of a main() , so I'm assuming that to be the case.

Note that the C string is NOT copied. The reason is that in Py2, you simply get access to the underlying data and in Py3, the converted string is kept as a data member of the Unicode object and Python will do the memory management. In both cases, we are guaranteed that their lifetimes will be at least as long as the lifetime as the input Python object (pyargv11), so at least for the duration of this function call. If other functions decide to keep pointers, copies would be needed.

            argv[i] = (char*)MyPyText_AsString(item);

The result of PySequence_GetItem was a new reference, so now that we're done with it, drop it:

            Py_DECREF(item);

It is possible that the input array did not contain only Python str objects. In that case, the conversion will fail and we need to check for that case, or "closed_function" may segfault.

            if (!argv[i] || PyErr_Occurred()) {

Clean up the memory previously allocated.

                free(argv);

Set argv to NULL for success checking later on:

                argv = nullptr;

Give up on the loop:

                break;

If the given object was not a sequence, or if one of the elements of the sequence was not a string, then we don't have an argv and so we bail:

    if (!argv) {

The following is a bit lazy, but probably better to understand if all you want to look at is C code.

        fprintf(stderr,  "argument is not a sequence of strings\n");
        return;

What you should really do, is check whether an error was already set (eg b/c of a conversion problem) and set one if not. Then notify pybind11 of it. This will give you a clean Python exception on the caller's end. This goes like so:

        if (!PyErr_Occurred())
            PyErr_SetString(PyExc_TypeError, "could not convert input to argv");
        throw py::error_already_set();       // by pybind11 convention.

Alright, if we get here, then we have an argc and argv , so now we can use them:

    for (int i = 0; i < argc; ++i)
        fprintf(stderr, "%s\n", argv[i]);

Finally, clean up the allocated memory.

    free(argv);

Notes:

  • I would still advocate for the use of at least std::unique_ptr as that makes life so much easier in case there are C++ exceptions thrown (from custom converters of any input object).
  • I was originally expecting to be able to replace all of the code with the one-liner std::vector<char*> pv{pyargv.cast<std::vector<char*>>()}; after #include <pybind11/stl.h> , but I found that that does not work (even as it does compile). Neither did using std::vector<std::string> (also compiles, but also fails at run-time).

Just ask if anything is still unclear.

EDIT : If you truly only want to have a PyListObject, just call PyList_Check(pyargv11.ptr()) and if true, cast the result: PyListObject* pylist = (PyListObject*)pyargv11.ptr() . Now, if you want to work with py::list , you can also use the following code:

#include <pybind11/pybind11.h>
#include <stdio.h>

#if PY_VERSION_HEX < 0x03000000
#define MyPyText_AsString PyString_AsString
#else
#define MyPyText_AsString PyUnicode_AsUTF8
#endif

namespace py = pybind11;

int run(py::list inlist) {
    int argc = (int)inlist.size();
    char** argv = (char**)malloc(argc * sizeof(char*));

    for (int i = 0; i < argc; ++i)
        argv[i] = (char*)MyPyText_AsString(inlist[i].ptr());

    for (int i = 0; i < argc; ++i)
        fprintf(stderr, "%s\n", argv[i]);

    free(argv);

    return 0;
}

PYBIND11_MODULE(example, m) {
    m.def("run", &run, "runs the example");
}

This code is shorter only b/c it has less functionality: it only accepts lists and it also is more clunky in error handling (eg. it will leak if passed in a list of integers due to pybind11 throwing an exception; to fix that, use unique_ptr as in the very first example code so that argv is freed on exception).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2025 STACKOOM.COM