简体   繁体   中英

Memory leaks in Python when using an external C DLL

I have a python module that calls a DLL written C to encode XML strings. Once the function returns the encoded string, it fails to de-allocate the memory which was allocated during this step. Concretely:

encodeMyString = ctypes.create_string_buffer(4096)

CallEncodingFuncInDLL(encodeMyString, InputXML)

I have looked at this , this , and this and have also tried calling the gc.collect but perhaps since the object has been allocated in an external DLL, python gc doesn't have any record of it and fails to remove it. But since the code keeps calling the encoding function, it keeps on allocating memory and eventually the python process crashes. Is there a way to profile this memory usage?

Since you haven't given any information about the DLL, this will necessarily be pretty vague, but…

Python can't track memory allocated by something external that it doesn't know about. How could it? That memory could be part of the DLL's constant segment, or allocated with mmap or VirtualAlloc , or part of a larger object, or the DLL could just be expecting it to be alive for its own use.

Any DLL that has a function that allocates and returns a new object has to have a function that deallocates that object. For example, if CallEncodingFuncInDLL returns a new object that you're responsible for, there will be a function like DestroyEncodedThingInDLL that takes such an object and deallocates it.

So, when do you call this function?


Let's step back and make this more concrete. Let's say the function is plain old strdup , so the function you call to free up the memory is free . You have two choices for when to call free . No, I have no idea why you'd ever want to call strdup from Python, but it's about the simplest possible example, so let's pretend it's not useless.


The first option is to call strdup , immediately convert the returned value to a native Python object and free it, and not have to worry about it after that:

newbuf = libc.strdup(mybuf)
s = newbuf.value
libc.free(newbuf)
# now use s, which is just a Python bytes object, so it's GC-able

Or, better, wrap this up so it's automatic by using a custom restype callable:

def convert_and_free_char_p(char_p):
    try:
        return char_p.value
    finally:
        libc.free(char_p)
libc.strdup.restype = convert_and_free_char_p

s = libc.strdup(mybuf)
# now use s

But some objects can't be converted to a native Python object so easily—or they can be, but it's not very useful to do so, because you need to keep passing them back into the DLL. In that case, you can't clean it up until you're done with it.

The best way to do this is to wrap that opaque value up in a class that releases it on close or __exit__ or __del__ or whatever seems appropriate. One nice way to do this is with @contextmanager :

@contextlib.contextmanager
def freeing(value):
    try:
        yield value
    finally:
        libc.free(value)

So:

newbuf = libc.strdup(mybuf)
with freeing(newbuf):
    do_stuff(newbuf)
    do_more_stuff(newbuf)
# automatically freed before you get here
# (or even if you don't, because of an exception/return/etc.)

Or:

@contextlib.contextmanager
def strduping(buf):
    value = libc.strdup(buf)
    try:
        yield value
    finally:
        libc.free(value)

And now:

with strduping(mybuf) as newbuf:
    do_stuff(newbuf)
    do_more_stuff(newbuf)
# again, automatically freed here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM