简体   繁体   中英

Unexpected Behaviour from tcmalloc

I have been using tcmalloc for a few months in a large project, and so far I must say that I am pretty happy about it, most of all for its HeapProfiling features which allowed to track memory leaks and remove them.

In the past couple of weeks though we experienced random crashes in our application, and we could not find the source of the random crash. In a very particular situation, When the application crashed, we found ourselves with a completely corrupted stack for one of the application threads. Several times instead I found that threads were stuck in tcmalloc::PageHeap::AllocLarge(), but since I dont have debug symbols of tcmalloc linked, I could not understand what the issue was.

After nearly one week of investigation, today I tried the most simple of things: removed tcmalloc from linkage to avoid using it, just to see what happened. Well... I finally found out what the problem was, and the offending code looks very much like this:

   void AllocatingFunction()
   {
       Object object_on_stack;
       ProcessObject(&object_on_stack);

   }

   void ProcessObject(Object* object)
   {
       ...
       // Do Whatever
       ...
       delete object;
   }

Using libc the application still crashed but I finally saw that I was calling delete on an object which was allocated on the stack.

What I still cant figure out is why tcmalloc instead kept the application running regardless of this very risky (if not utterly wrong) object deallocation, and the double deallocation when object_on_stack goes out of scope when AllocatingFunction ends. The fact is that the offending code could be called repeatedly without any hint of the underlying abomination.

I know that memory deallocation is one of those "undefined behaviour" when not used properly, but my surprise is such a different behaviour between "standard" libc and tcmalloc.

Does anyone have some sort of explanation of insight, on why tcmalloc keeps the application running?

Thanks in advance :)

Have a Nice Day

very risky (if not utterly wrong) object deallocation

well, I disagree here, it is utterly wrong, and since you invoke UB, anything can happen.

It very much depends on what the tcmalloc code acutally does on deallocation, and how it uses the (possibly garbage) data around the stack at that location.

I have seen tcmalloc crash on such occasions too, as well as glibc going into an infinite loop. What you see is just coincidence.

Firstly, there was no double free in your case. When object_on_stack goes out of scope there is no free call, just the stack pointer is decreased (or rather increased, as stack grows down...).

Secondly, during delete TcMalloc should be able to recognize that the address from a stack does not belong to the program heap. Here is a part of free(ptr) implementation:

const PageID p = reinterpret_cast<uintptr_t>(ptr) >> kPageShift;
Span* span = NULL;
size_t cl = Static::pageheap()->GetSizeClassIfCached(p);

if (cl == 0) {
    span = Static::pageheap()->GetDescriptor(p);
    if (!span) {
        // span can be NULL because the pointer passed in is invalid
        // (not something returned by malloc or friends), or because the
        // pointer was allocated with some other allocator besides
        // tcmalloc.  The latter can happen if tcmalloc is linked in via
        // a dynamic library, but is not listed last on the link line.
        // In that case, libraries after it on the link line will
        // allocate with libc malloc, but free with tcmalloc's free.
        (*invalid_free_fn)(ptr);  // Decide how to handle the bad free request
        return;
    }

Call to invalid_free_fn crashes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM