The inner workings of glibc's free()

Question

For glibc 2.15 I was looking at malloc.c, specifically the free() function, and became confused about the unlink() macro. According to the source a chunk in use looks like this:

   chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  Size of previous chunk, if allocated            
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  Size of chunk, in bytes                       
     mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  User data starts here...                          .
    .                                                               .
    .             (malloc_usable_size() bytes)                      .
    .                                                               
nextchunk->+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

and a free()'d chunk looks like this:

    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                         Size of previous chunk                    
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 `head:'           Size of chunk, in bytes                          
  mem->     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  Forward pointer to next chunk in list             
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  Back pointer to previous chunk in list            
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  Unused space (may be 0 bytes long)                .
    .                                                               .
    .                                                               
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

When a used chunk is free()'d it takes the mem pointer it received as an argument and subtracts an offset from it to get a chunk pointer. There are a bunch of checks in between but in the case that the chunk wasn't mmapped it usually forward- or backward-consolidates it with another free chunk. Since the chunk being free()'d is already in a bin it just searches within that particular bin for chunks to consolidate it with, correct? In the case of forward consolidation the unlink() macro is called and applied to the chunk that follows the chunk being free()'d. I don't understand this because when the next chunk (call it 'nextchunk') is unlinked the following code occurs:

    #define unlink(P, BK, FD) {                                            
    FD = P->fd;                                                          
    BK = P->bk;
    .
    .
    .
    FD->bk = BK;                                                       
    BK->fd = FD;
    .
    .
    .
                             }

How can BK->fd be referenced considering that BK points to the chunk being free()'d and looking at its structure it does not have a forward or backward pointer. I must have missed the part in the code where the fd and bk fields are added to the chunk being free()'d but I don't know where. Can anyone help? Thanks.

Answer 1

This line creates a forward pointer in the chunk being freed:

BK->fd = FD;

BK used to be a chunk of user data, but now it's a chunk of free data so malloc is allowed to scribble over the memory as it sees fit.

If it helps, you can think of it like a union:

union {
    struct {
        chunk *fd;
        chunk *bk;
    } freed;
    unsigned char user_data[N];
};

In a union, you're allowed to write into any of the union members, but you can only read from the most recently written member. So when free is called, data is written to fd and bk -- which is okay, the only consequence is that user_data might have garbage now. By comparison, when the chunk contains user data (not free), then the fd and bk pointers are garbage since they alias user_data .

(Technically, you can always read from user_data no matter what aliases it since it's an unsigned char , but that isn't really relevant.)

Update: This is low-level C code. You'd expect low-level C code in a malloc implementation. The idea that a field exists or does not exist does not make sense in low-level code, since we are casting to and from different types and allowing pointers to alias each other.

In low-level code, a field is just a memory offset. On my system, the fd field might have an offset of 0, and the bk field might have an offset of 8 or 4, depending on the architecture I compile for. So the following code:

BK->fd = FD;

This means "write the value FD to the memory location BK + 0". If you think of BK->fd as just a location in memory, it may help you understand how free works. (It's not actually just a location in memory, since at compile-time there is also type information and aliasing rules.)

Understanding low-level C: If you want to understand low-level C code, it helps immensely to understand assembly language. It's not necessary, but it helps. It doesn't matter which assembly language you learn: x86, MIPS, PowerPC, ARM, etc. You don't need to learn much assembly, just a little bit. You don't need to learn x86, you can learn MIPS even if you never use MIPS. (In fact, MIPS is probably easier to learn.)

Just learn enough assembly that you can translate a small piece of C code into assembly so you can understand what it's doing under the hood. That one line of C code above probably translates into one line of assembly code, because it's so simple.

And try not to think about assembly much when you're writing C. When you're writing C, the compiler's writing assembly, which means you aren't writing assembly.

The inner workings of glibc's free()

Question

1 answers

solution1
1 ACCPTED 2012-05-18 06:46:38

The inner workings of glibc's free()

Question

1 answers

solution1 1 ACCPTED 2012-05-18 06:46:38

solution1
1 ACCPTED 2012-05-18 06:46:38