简体   繁体   中英

How to store PyObject* correctly in C?

I am writing a small library for caching. Python dict doesn't suit me, already tried std::map, got SIGSEGV with quite similar errors. Anyway, the whole point is described in logs below. What am I doing wrong? Is there some other way to store objects in C?

Problem:

(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/python3 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Python 3.9.2 (default, Feb 20 2021, 18:40:11) 
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from syncached import cache
>>> cache.push(1, object())
>>> cache.get(1) == object()
True
>>> cache.get(1) == object()

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7d049d0 in PyMem_Calloc () from /usr/lib/libpython3.9.so.1.0
(gdb) bt
#0  0x00007ffff7d049d0 in PyMem_Calloc () from /usr/lib/libpython3.9.so.1.0
#1  0x00007ffff7cfb27d in PyList_New () from /usr/lib/libpython3.9.so.1.0
#2  0x00007ffff7d6f4e3 in ?? () from /usr/lib/libpython3.9.so.1.0
#3  0x00007ffff7de2e37 in PyAST_CompileObject () from /usr/lib/libpython3.9.so.1.0
#4  0x00007ffff7de2c3b in ?? () from /usr/lib/libpython3.9.so.1.0
#5  0x00007ffff7cf68ab in ?? () from /usr/lib/libpython3.9.so.1.0
#6  0x00007ffff7cf6a63 in PyRun_InteractiveLoopFlags () from /usr/lib/libpython3.9.so.1.0
#7  0x00007ffff7c84f6b in PyRun_AnyFileExFlags () from /usr/lib/libpython3.9.so.1.0
#8  0x00007ffff7c7965c in ?? () from /usr/lib/libpython3.9.so.1.0
#9  0x00007ffff7dc9fa9 in Py_BytesMain () from /usr/lib/libpython3.9.so.1.0
#10 0x00007ffff7a46b25 in __libc_start_main () from /usr/lib/libc.so.6
#11 0x000055555555504e in _start ()

pyhashmap.c:

#include "Python.h"
#include <stdlib.h>

typedef struct {
    Py_hash_t key;
    PyObject *val;
} hashmap_member;

typedef struct {
    size_t cache_size;
    size_t currsize;
    hashmap_member *list;
} pyhashmap;

pyhashmap *new_map(size_t size){
    pyhashmap *map = PyMem_Malloc(sizeof(pyhashmap));
    map->cache_size = size;
    map->currsize = 0;
    map->list = PyMem_Malloc(size*sizeof(hashmap_member));
    return map;
}

void map_insert(pyhashmap *map, Py_hash_t key, PyObject *val){
    if (map->currsize == map->cache_size){
        return;
    }
    for (size_t i = 0; i < map->currsize; i++){
        if (map->list[i].key == key){
            return;
        }
    }
    map->list[map->currsize] = (hashmap_member) {.key = key, .val = val};
    map->currsize++;
}

PyObject *map_get(pyhashmap *map, Py_hash_t key){
    for (size_t i = 0; i < map->currsize; i++){
        if (map->list[i].key == key){
            return map->list[i].val;
        }
    }
    return Py_None;
}

ipyhashmap.pxd:

cdef extern from "pyhashmap.c":
    ctypedef struct pyhashmap
    pyhashmap *new_map(size_t)
    void map_insert(pyhashmap *, int, object)
    object map_get(pyhashmap *, int)

cache.pyx:

from syncached.ipyhashmap cimport pyhashmap, new_map, map_insert, map_get

cdef pyhashmap *map = new_map(5)

cpdef push(int key, object val):
    map_insert(map, key, val)

cpdef get(key):
    return map_get(map, key)

Also, second problem:

>>> cache.push(3, {"a": "B"})
>>> cache.get(3)
{3: 3, ((<NULL>, 'get'), ('cache', 'get')): ((((((...), ()), None), (3, None)), 'get'), ('cache', 'get')), ((((((...), None), None), ((((...), 'get'), ((...), 'get')), None)), 'get'), ()): ((((((...), 'get'), None), (((...), 'get'), None)), 'get'), ()), ((((...), 'get'), None), (((...), 'get'), None)): ((((...), 'get'), None), (((...), 'get'), None)), 'Py_Repr': [{...}, [...]]}
>>> cache.get(3)
KeyError: 'unknown symbol table entry'
>>> cache.get(3)
[1]    21720 segmentation fault (core dumped)  python3

Option 1: Have another python object reference the stored python object.

The best way I can recommend is to keep the stored python objects from being garbage collected while they are in your custom map by also keeping these objects stored in python list, dict or set. This will ensure that the reference count does not fall to zero while your objects are stored.

Option 2: Manage the stored python object reference count manually.

You can try to manually manage the reference count when dealing with pointers to Python Object (PyObject*). If you increment the reference count but don't decrement it the same number of times, the object will never be deleted from memory when no longer in use. That occupied can not be reclaimed by the application, ie the memory will leak. However, if you don't increment the reference count, the object could be deleted while you are still referring to it in your C code.

You can try and manage the memory directly in C using Py_INCREF and Py_DECREF from Python's Reference Counting API (see answer to similar question here ). If you are allowed to use C++ instead of C, then RAII can make reference counting management simpler.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM