PyArg_ParseTuple causes segfault when passing pointer instead of address

Question

While learning Python C extension modules utilizing CPython's C API, I have encountered a curious segfault bug (disclaimer: I have only a passing fluency of C). A typical example module method, written in C (which can then be imported into python) might look like this:

static PyObject *method_echo(PyObject *self, PyObject *args) {
    long a;
    if(!PyArg_ParseTuple(args, "l", &a)) {
        return NULL;
    }
    printf("Value of the passed variable is: %li\n", a);
    return PyLong_FromLong(a);
}

This works for me without issue. The problem comes if I choose to declare a as a pointer and pass it to PyArg_ParseTuple , for example, changing the relevant lines to:

    long *a;
    if(!PyArg_ParseTuple(args, "l", a)) {
        return NULL;
    }

(and of course modifying the remaining lines to work with a pointer), this results in a segfault. HOWEVER, if I remove the return NULL line:

    long *a;
    PyArg_ParseTuple(args, "l", a);

This runs without issue. Even though the return NULL statement never gets executed (I have checked that explicitly with a printf in the conditional block), somehow it causes a segfault if I pass a pointer to PyArg_ParseTuple . Any ideas what's going on?

Here are some details of my system, followed by some example code that should be able to reproduce the problem:

macOS 11.6 python3.9 C compiler: clang (clang-1300.0.29.30)

C extension module (which will import in python as test1_pptr ):

test1_parsepointer.c

#define PY_SSIZE_T_CLEAN
#include <python3.9/Python.h>

static PyObject *method_parse_ptr1(PyObject *self, PyObject *args) {
    long *a;
    if(!PyArg_ParseTuple(args, "l",a)) {
        printf("PROBLEM ENCOUNTERED\n");
    };
    printf("  ptr-v1: Value of var is: %li\n", *a);
    return PyLong_FromLong(*a);
}

static PyObject *method_parse_ptr2(PyObject *self, PyObject *args) {
    long *a;
    if(!PyArg_ParseTuple(args, "l",a)) {
        return NULL;
    };
    printf("  ptr-v2: Value of var is: %li\n", *a);
    return PyLong_FromLong(*a);
}

static PyObject *method_parse_val(PyObject *self, PyObject *args) {
    long a;
    if(!PyArg_ParseTuple(args, "l",&a)) {
        return NULL;
    };
    printf("     val: Value of var is: %li\n", a);
    return PyLong_FromLong(a);
    
}

static PyMethodDef parseptr_methods[] = {
    {"parse_ptr_v1", method_parse_ptr1, METH_VARARGS, "Parse as pointer, no NULL"},
    {"parse_ptr_v2", method_parse_ptr2, METH_VARARGS, "Parse as pointer, with NULL"},
    {"parse_val", method_parse_val, METH_VARARGS, "Parse as val, with NULL"},
    {NULL, NULL, 0, NULL}
};

static struct PyModuleDef parsing_ptrs = {
    PyModuleDef_HEAD_INIT,
    "test1_pptr",
    "Testing PyArg_ParseTuple vars as pointers",
    -1,
    parseptr_methods
};

PyMODINIT_FUNC PyInit_test1_pptr(void) {
    return PyModule_Create(&parsing_ptrs);
}

I compile this with the following command:

clang -shared -undefined dynamic_lookup -o test1_parsepointer.so test1_parsepointer.c

Create a .py file that bootstraps this module upon import:

test1_pptr.py:

def __bootstrap__():
    global __bootstrap__, __loader__, __file__
    import sys, pkg_resources, importlib.util
    __file__ = pkg_resources.resource_filename(__name__, 'test1_parsepointer.so')
    __loader__ = None; del __bootstrap__, __loader__
    spec = importlib.util.spec_from_file_location(__name__,__file__)
    mod = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(mod)
__bootstrap__()

And finally, the methods can be tested with the following python script:

import test1_pptr as tppr

"""
Three functions in tppr should be:
    parse_ptr_v1(int)
    parse_ptr_v2(int)
    parse_val
"""

def main():
    a = int(3)
    print("about to test parse-by-value...")
    tppr.parse_val(a) # runs fine

    print("about to test parse-by-pointer v1...")
    tppr.parse_ptr_v1(a) # runs fine
    
    print("about to test parse-by-pointer v2...")
    tppr.parse_ptr_v2(a) # segfaults

if __name__ == "__main__":
    main()

Answer 1

long *a;

This doesn't point to anything valid because you haven't initialized it (either by allocating memory for a long or taking the address of an existing long ).

if(!PyArg_ParseTuple(args, "l", a))

This is attempting to write into whatever a points to. But a doesn't point to a valid long . Therefore it crashes.

The fact that it seems to work in some cases is completely uninteresting. Writing into an invalid pointer is undefined behaviour. Practically it's just arbitrary what a gets initialized to point at. There's no value in attempting to understand it.

PyArg_ParseTuple causes segfault when passing pointer instead of address

Question

1 answers

solution1
0 2022-12-20 17:54:00

PyArg_ParseTuple causes segfault when passing pointer instead of address

Question

1 answers

solution1 0 2022-12-20 17:54:00

solution1
0 2022-12-20 17:54:00