简体   繁体   中英

How to convert a C binary buffer to it’s hex representation in Python string?

It's well known pysha3 isn't compatible with pypy, and because it's unmaintained for 3 years, I have to modify it myself.

Of course, a proper way would be to perform a complete rewrite in pure python code (which would also results in a faster implementation over the current one), but I lack the required knowledge both in cryptograhy and background math to do this, and the program using it is very very list intensive (which requires a python3 without a gil for multithreading or python3 with a jit).

The single point of failure boils down to this function which has to be called by C code:

static PyObject*
_Py_strhex(const char* argbuf, const Py_ssize_t arglen)
{
    static const char *hexdigits = "0123456789abcdef";

    PyObject *retval;
#if PY_MAJOR_VERSION >= 3
    Py_UCS1 *retbuf;
#else
    char *retbuf;
#endif
    Py_ssize_t i, j;

    assert(arglen >= 0);
    if (arglen > PY_SSIZE_T_MAX / 2)
        return PyErr_NoMemory();

#if PY_MAJOR_VERSION >= 3
    retval = PyUnicode_New(arglen * 2, 127);
    if (!retval)
            return NULL;
    retbuf = PyUnicode_1BYTE_DATA(retval);
#else
    retval = PyString_FromStringAndSize(NULL, arglen * 2);
    if (!retval)
            return NULL;
    retbuf = PyString_AsString(retval);
    if (!retbuf) {
            Py_DECREF(retval);
            return NULL;
    }
#endif
    /* make hex version of string, taken from shamodule.c */
    for (i=j=0; i < arglen; i++) {
        unsigned char c;
        c = (argbuf[i] >> 4) & 0xf;
        retbuf[j++] = hexdigits[c];
        c = argbuf[i] & 0xf;
        retbuf[j++] = hexdigits[c];
    }

    return retval;
}

cython compatibility level is at 3.2 for pypy and PyUnicode_New was introduced in python3.3.

I tried the hammer way to fix it with replacing the whole file with the following cython code:

cdef Py_strhex(const char* argbuf, const Py_ssize_t arglen):
    return (argbuf[:arglen]).hex()

but it seems it triggers a segmentation fault including compiling and using the official Python implementation. And using the official PyPy binary, I don't have the debugging symbols for gdb so I don't know why.

(gdb) bt
#0  0x00007ffff564cd00 in pypy_g_text_w__pypy_interpreter_baseobjspace_W_Root () from /usr/lib64/pypy3.6-v7.2.0-linux64/bin/libpypy3-c.so
#1  0x00007ffff5d721a8 in pypy_g_getattr () from /usr/lib64/pypy3.6-v7.2.0-linux64/bin/libpypy3-c.so
#2  0x00007ffff543a8bd in pypy_g_dispatcher_15 () from /usr/lib64/pypy3.6-v7.2.0-linux64/bin/libpypy3-c.so
#3  0x00007ffff5ab909b in pypy_g_wrapper_second_level.star_2_14 () from /usr/lib64/pypy3.6-v7.2.0-linux64/bin/libpypy3-c.so
#4  0x00007fffd7212372 in _Py_strhex.2738 () from /usr/lib64/pypy3.6-v7.2.0-linux64/site-packages/pysha3-1.0.3.dev1-py3.6-linux-x86_64.egg/_pysha3.pypy3-72-x86_64-linux-gnu.so
#5  0x00007fffd7217990 in _sha3_sha3_224_hexdigest_impl.2958 () from /usr/lib64/pypy3.6-v7.2.0-linux64/site-packages/pysha3-1.0.3.dev1-py3.6-linux-x86_64.egg/_pysha3.pypy3-72-x86_64-linux-gnu.so
#6  0x00007ffff5be2170 in pypy_g_generic_cpy_call__StdObjSpaceConst_funcPtr_SomeI_5 () from /usr/lib64/pypy3.6-v7.2.0-linux64/bin/libpypy3-c.so
#7  0x00007ffff54b25cd in pypy_g.call_1 () from /usr/lib64/pypy3.6-v7.2.0-linux64/bin/libpypy3-c.so
#8  0x00007ffff56715b9 in pypy_g_BuiltinCodePassThroughArguments1_funcrun_obj () from /usr/lib64/pypy3.6-v7.2.0-linux64/bin/libpypy3-c.so
#9  0x00007ffff56ffc06 in pypy_g_call_valuestack__AccessDirect_None () from /usr/lib64/pypy3.6-v7.2.0-linux64/bin/libpypy3-c.so
#10 0x00007ffff5edb29b in pypy_g_CALL_METHOD__AccessDirect_star_1 () from /usr/lib64/pypy3.6-v7.2.0-linux64/bin/libpypy3-c.so

Increasing the default Linux stack depth to 65Mb doesn't change the depth of recursion where the segfault happens so even if the stack depth is larger than 200, this doesn't seems to be related to a stack overflow.

In terms of the Cython, it's simpler than you think:

cdef Py_strhex(const char* argbuf, const Py_ssize_t arglen):
    return (argbuf[:arglen]).hex()

Essentially you don't need to malloc (which was introducing a memory leak anyway because it was missing a free ) and you don't need the memcpy . argbuf[:arglen] creates a bytes object with the appropriate length (making a copy of the data).

This definitely works on CPython. On PyPy2 it produces AttributeError: 'str' object has no attribute 'hex' , which is correct for Python 2. I'd imagine if it were to produce a segmentation fault it would happen before the AttributeError so that's promising. I don't have PyPy3 readily available...


Edit :

I've now managed to test my code on PyPy3 like follows:

# extra Cython code just to call the function
def test():
    cdef const char* a = "0123456789"
    return Py_strhex(a,10)

Then from Python:

import modulename
modulename.test()

This works fine without a segmentation fault ; therefore I'm pretty convinced this code is fine.

I do not know how you're calling the Cython code since you do not say; however Cython does not generate C code with the intention that you just copy an individual function. It generates a module and the module expects to be imported (some stuff is set up during the module import). Specifically Cython sets up a table of strings during module initialization including the string "hex" used to look up the attribute. To correctly use this code you'd need to ensure the module it's contained in is imported first rather than just dump a copy of the generate Cython code in a C file. Doing this is a bit complicated in Python 3 and probably doesn't suit your purposes.

I'll leave this answer in it's current state since I believe it's correct and the issues are occurring in the parts you don't specify. It's quite likely it isn't useful to you and you're free to ignore it.

Ok found what I was looking for using this variant. This won't work on all compilers and is compatible only with Python3 but it brings partial PyPy compatibility (some tests which are supposed to fails succeeds because an Incorrect hash is returned) with pysha3 alongs the programs it depends on:

static PyObject * _Py_strhex(const char* argbuf, const Py_ssize_t arglen) {
    static const char *hexdigits = "0123456789abcdef";

    assert(arglen >= 0);

    if (arglen > PY_SSIZE_T_MAX / 2)
        return PyErr_NoMemory();

    const Py_ssize_t len=arglen*2;
    char retbuf[len+1];
    retbuf[len+1]=0;

    /* make hex version of string, taken from shamodule.c */
    for (Py_ssize_t i=0,j=0; i < arglen; i++) {
        retbuf[j++] = hexdigits[(argbuf[i] >> 4) & 0xf];
        retbuf[j++] = hexdigits[argbuf[i] & 0xf];
    }

    return PyUnicode_FromStringAndSize(retbuf,len);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM