批判我的非侵入式堆调试器

Question

This is a follow-up to Critique my heap debugger from yesterday. 这是昨天批判我的堆调试器的后续行动。 As suggested by bitc, I now keep metadata about the allocated blocks in a separate handwritten hashtable. 正如bitc所建议的，我现在将分配的块的元数据保存在单独的手写哈希表中。

The heap debugger now detects the following kinds of errors: 堆调试器现在检测到以下类型的错误：

memory leaks (now with more verbose debugging output) 内存泄漏（现在有更详细的调试输出）
illegal pointers passed to delete (that also takes care of double deletes) 传递给删除的非法指针（也处理双重删除）
wrong form of delete (array vs. non-array) 错误的删除形式（数组与非数组）
buffer overflows 缓冲区溢出
buffer underflows 缓冲下溢

Feel free to discuss and thanks in advance! 欢迎提前讨论和感谢！

#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <new>

namespace
{
    // I don't want to #include <algorithm> for a single function template :)
    template <typename T>
    void my_swap(T& x, T& y)
    {
        T z(x);
        x = y;
        y = z;
    }

    typedef unsigned char byte;

    const byte CANARY[] = {0x5A, 0xFE, 0x6A, 0x8D,
                           0x5A, 0xFE, 0x6A, 0x8D,
                           0x5A, 0xFE, 0x6A, 0x8D,
                           0x5A, 0xFE, 0x6A, 0x8D};

    bool canary_dead(const byte* cage)
    {
        bool dead = memcmp(cage, CANARY, sizeof CANARY);
        if (dead)
        {
            for (size_t i = 0; i < sizeof CANARY; ++i)
            {
                byte b = cage[i];
                printf(b == CANARY[i] ? "__ " : "%2X ", b);
            }
            putchar('\n');
        }
        return dead;
    }

    enum kind_of_memory {AVAILABLE, TOMBSTONE, NON_ARRAY_MEMORY, ARRAY_MEMORY};

    const char* kind_string[] = {0, 0, "non-array memory", "    array memory"};

    struct metadata
    {
        byte* address;
        size_t size;
        kind_of_memory kind;

        bool in_use() const
        {
            return kind & 2;
        }

        void print() const
        {
            printf("%s at %p (%d bytes)\n", kind_string[kind], address, size);
        }

        bool must_keep_searching_for(void* address)
        {
            return kind == TOMBSTONE || (in_use() && address != this->address);
        }

        bool canaries_alive() const
        {
            bool alive = true;
            if (canary_dead(address - sizeof CANARY))
            {
                printf("ERROR:    buffer underflow at %p\n", address);
                alive = false;
            }
            if (canary_dead(address + size))
            {
                printf("ERROR:     buffer overflow at %p\n", address);
                alive = false;
            }
            return alive;
        }
    };

    const size_t MINIMUM_CAPACITY = 11;

    class hashtable
    {
        metadata* data;
        size_t used;
        size_t capacity;
        size_t tombstones;

    public:

        size_t size() const
        {
            return used - tombstones;
        }

        void print() const
        {
            for (size_t i = 0; i < capacity; ++i)
            {
                if (data[i].in_use())
                {
                    printf(":( leaked ");
                    data[i].print();
                }
            }
        }

        hashtable()
        {
            used = 0;
            capacity = MINIMUM_CAPACITY;
            data = static_cast<metadata*>(calloc(capacity, sizeof(metadata)));
            tombstones = 0;
        }

        ~hashtable()
        {
            free(data);
        }

        hashtable(const hashtable& that)
        {
            used = 0;
            capacity = 3 * that.size() | 1;
            if (capacity < MINIMUM_CAPACITY) capacity = MINIMUM_CAPACITY;
            data = static_cast<metadata*>(calloc(capacity, sizeof(metadata)));
            tombstones = 0;

            for (size_t i = 0; i < that.capacity; ++i)
            {
                if (that.data[i].in_use())
                {
                    insert_unsafe(that.data[i]);
                }
            }
        }

        hashtable& operator=(hashtable copy)
        {
            swap(copy);
            return *this;
        }

        void swap(hashtable& that)
        {
            my_swap(data, that.data);
            my_swap(used, that.used);
            my_swap(capacity, that.capacity);
            my_swap(tombstones, that.tombstones);
        }

        void insert_unsafe(const metadata& x)
        {
            *find(x.address) = x;
            ++used;
        }

        void insert(const metadata& x)
        {
            if (2 * used >= capacity)
            {
                hashtable copy(*this);
                swap(copy);
            }
            insert_unsafe(x);
        }

        metadata* find(void* address)
        {
            size_t index = reinterpret_cast<size_t>(address) % capacity;
            while (data[index].must_keep_searching_for(address))
            {
                ++index;
                if (index == capacity) index = 0;
            }
            return &data[index];
        }

        void erase(metadata* it)
        {
            it->kind = TOMBSTONE;
            ++tombstones;
        }
    } the_hashset;

    struct heap_debugger
    {
        heap_debugger()
        {
            puts("heap debugger started");
        }

        ~heap_debugger()
        {
            the_hashset.print();
            puts("heap debugger shutting down");
        }
    } the_heap_debugger;

    void* allocate(size_t size, kind_of_memory kind) throw (std::bad_alloc)
    {
        byte* raw = static_cast<byte*>(malloc(size + 2 * sizeof CANARY));
        if (raw == 0) throw std::bad_alloc();

        memcpy(raw, CANARY, sizeof CANARY);
        byte* payload = raw + sizeof CANARY;
        memcpy(payload + size, CANARY, sizeof CANARY);

        metadata md = {payload, size, kind};
        the_hashset.insert(md);
        printf("allocated ");
        md.print();
        return payload;
    }

    void release(void* payload, kind_of_memory kind) throw ()
    {
        if (payload == 0) return;

        metadata* p = the_hashset.find(payload);

        if (!p->in_use())
        {
            printf("ERROR:   no dynamic memory at %p\n", payload);
        }
        else if (p->kind != kind)
        {
            printf("ERROR:wrong form of delete at %p\n", payload);
        }
        else if (p->canaries_alive())
        {
            printf("releasing ");
            p->print();
            free(static_cast<byte*>(payload) - sizeof CANARY);
            the_hashset.erase(p);
        }
    }
}

void* operator new(size_t size) throw (std::bad_alloc)
{
    return allocate(size, NON_ARRAY_MEMORY);
}

void* operator new[](size_t size) throw (std::bad_alloc)
{
    return allocate(size, ARRAY_MEMORY);
}

void operator delete(void* payload) throw ()
{
    release(payload, NON_ARRAY_MEMORY);
}

void operator delete[](void* payload) throw ()
{
    release(payload, ARRAY_MEMORY);
}

int main()
{
    int* p = new int[1];
    delete p;   // wrong form of delete
    delete[] p; // ok
    delete p;   // no dynamic memory (double delete)

    p = new int[1];
    p[-1] = 0xcafebabe;
    p[+1] = 0x12345678;
    delete[] p; // underflow and overflow prevent release
                // p is not released, hence leak
}

Answer 1

Very nice, indeed. 的确很好。 Your canaries could actually reveal some real cases of overflow/underflow (though not all of them as Matthieu pointed out). 你的金丝雀实际上可以揭示一些溢出/下溢的真实案例（尽管不是Matthieu指出的所有情况）。

What more. 还有什么。 You might run into some problems with a multi-threaded application. 您可能会遇到多线程应用程序的一些问题。 Perhaps protect the hashtable from concurrent access? 也许保护哈希表不受并发访问的影响？

Now that you log every allocation and deallocation, you can (if you like) provide more information about the program being tested. 现在您记录了每个分配和释放，您可以（如果您愿意）提供有关正在测试的程序的更多信息。 It might be interesting to know the total and average number of allocations at any given time? 了解任何给定时间的总分配和平均分配数量可能会很有趣吗？ The total, max, min and average bytes allocated, and the average lifespan of allocations. 分配的总字节数，最大值，最小值和平均字节数，以及分配的平均寿命。

If you want to compare different threads, at least with Pthreads you can identify them with pthread_self(). 如果你想比较不同的线程，至少用Pthreads你可以用pthread_self（）来识别它们。 This heap debugger could become a quite useful analysis tool. 这个堆调试器可能会成为一个非常有用的分析工具。

Answer 2

Are you using a very weak malloc that doesn't already have this sort of stuff built into it? 你是否使用了一个非常弱的malloc，它还没有内置这种东西？ Because if it's there, you are doubling the overhead for little gain. 因为如果它在那里，你将增加一倍的开销。 Also, this kind of system really hurts when doing small object allocation or is ineffective with them as people do 1 alloc and manage the memory themselves. 此外，这种系统在进行小对象分配时确实很痛，或者对它们无效，因为人们自己分配和管理内存。

As far as the code is concerned, it looks like it will do what you say it will do and it looks well designed and is easy to read. 就代码而言，看起来它会像你说的那样做，它看起来设计得很好并且易于阅读。 But, if you are going to go through the trouble of doing this though, why not catch your buffer over/under flows at the source by using managed containers/pointers/operator[] thingies. 但是，如果你要经历这样做的麻烦，为什么不通过使用托管容器/指针/运算符[]东西在源上捕获缓冲区上/下流。 That way, you can debug on the spot of the failure instead of finding out at free that something evil has occured. 这样，您可以在失败的现场进行调试，而不是在免费中发现邪恶已经发生的事情。

There are efficiencies to be had that I'm sure others will find, but these are just some thoughts off the top of my head after looking over your code for a few minutes. 我确信其他人会找到效率，但在查看代码几分钟后，这些只是我头脑中的一些想法。

Answer 3

I wonder about the detection of underflows / overflows. 我想知道检测到下溢/溢出。

I mean, if I have a 10 elements arrays, then it seems you'll detect if I write at -1 and 10 , but what if I write at 20 ? 我的意思是，如果我有一个10个元素的数组，那么你似乎会检测我是否在-1和10处写入，但是如果我在20写的话怎么办？ Underflow or Overflow are not necessarily done as part of a buffer overrun (contiguous). 下溢或溢出不一定是缓冲区溢出（连续）的一部分。

Furthermore, what's the point of preventing release of the block ? 此外，阻止块的释放有什么意义？ This block is (relatively) fine, it's the neighbors you've (unfortunately) corrupted. 这个块（相对）很好，它是你（不幸的）被破坏的邻居。

Anyway, it seems pretty fine to me, though I would probably have more than one return per function because there's no point in Single Exit. 无论如何，对我来说似乎很好，虽然我可能每个函数有多个返回，因为单一退出没有意义。 You seem more of a C programmer than a C++ one :) 你似乎更像是C程序员而不是C ++程序员:)

批判我的非侵入式堆调试器

问题描述

3 个解决方案

解决方案1
5 已采纳

解决方案2
2

解决方案3
2

批判我的非侵入式堆调试器

问题描述

3 个解决方案

解决方案1 5 已采纳

解决方案2 2

解决方案3 2

解决方案1
5 已采纳

解决方案2
2

解决方案3
2