简体   繁体   中英

Why can I write and read memory when I haven't allocated space?

I'm trying to build my own Hash Table in C from scratch as an exercise and I'm doing one little step at a time. But I'm having a little issue...

I'm declaring the Hash Table structure as pointer so I can initialize it with the size I want and increase it's size whenever the load factor is high.

The problem is that I'm creating a table with only 2 elements (it's just for testing purposes), I'm allocating memory for just those 2 elements but I'm still able to write to memory locations that I shouldn't. And I also can read memory locations that I haven't written to.

Here's my current code:

#include <stdio.h>
#include <stdlib.h>


#define HASHSIZE 2


typedef char *HashKey;
typedef int HashValue;

typedef struct sHashTable {
    HashKey key;
    HashValue value;
} HashEntry;

typedef HashEntry *HashTable;


void hashInsert(HashTable table, HashKey key, HashValue value) {
}

void hashInitialize(HashTable *table, int tabSize) {
    *table = malloc(sizeof(HashEntry) * tabSize);

    if(!*table) {
        perror("malloc");
        exit(1);
    }

    (*table)[0].key = "ABC";
    (*table)[0].value = 45;
    (*table)[1].key = "XYZ";
    (*table)[1].value = 82;
    (*table)[2].key = "JKL";
    (*table)[2].value = 13;
}


int main(void) {
    HashTable t1 = NULL;

    hashInitialize(&t1, HASHSIZE);

    printf("PAIR(%d): %s, %d\n", 0, t1[0].key, t1[0].value);
    printf("PAIR(%d): %s, %d\n", 1, t1[1].key, t1[1].value);
    printf("PAIR(%d): %s, %d\n", 3, t1[2].key, t1[2].value);
    printf("PAIR(%d): %s, %d\n", 3, t1[3].key, t1[3].value);

    return 0;
}

You can easily see that I haven't allocated space for (*table)[2].key = "JKL"; nor (*table)[2].value = 13; . I also shouldn't be able read the memory locations in the last 2 printfs in main() .

Can someone please explain this to me and if I can/should do anything about it?

EDIT:
Ok, I've realized a few things about my code above, which is a mess... But I have a class right now and can't update my question. I'll update this when I have the time. Sorry about that.

EDIT 2:
I'm sorry, but I shouldn't have posted this question because I don't want my code like I posted above. I want to do things slightly different which makes this question a bit irrelevant. So, I'm just going to assume this was question that I needed an answer for and accept one of the correct answers below. I'll then post my proper questions...

Just don't do it, it's undefined behavior.

It might accidentially work because you write/read some memory the program doesn't actually use. Or it can lead to heap corruption because you overwrite metadata used by the heap manager for its purposes. Or you can overwrite some other unrelated variable and then have hard times debugging the program that goes nuts because of that. Or anything else harmful - either obvious or subtle yet severe - can happen.

Just don't do it - only read/write memory you legally allocated.

Generally speaking (different implementation for different platforms) when a malloc or similar heap based allocation call is made, the underlying library translates it into a system call. When the library does that, it generally allocates space in sets of regions - which would be equal or larger than the amount the program requested.

Such an arrangement is done so as to prevent frequent system calls to kernel for allocation, and satisfying program requests for Heap faster (This is certainly not the only reason!! - other reasons may exist as well).

Fall through of such an arrangement leads to the problem that you are observing. Once again, its not always necessary that your program would be able to write to a non-allocated zone without crashing/seg-faulting everytime - that depends on particular binary's memory arrangement. Try writing to even higher array offset - your program would eventually fault.

As for what you should/should-not do - people who have responded above have summarized fairly well. I have no better answer except that such issues should be prevented and that can only be done by being careful while allocating memory.

One way of understanding is through this crude example: When you request 1 byte in userspace, the kernel has to allocate a whole page atleast (which would be 4Kb on some Linux systems, for example - the most granular allocation at kernel level). To improve efficiency by reducing frequent calls, the kernel assigns this whole page to the calling Library - which the library can allocate as when more requests come in. Thus, writing or reading requests to such a region may not necessarily generate a fault. It would just mean garbage.

In C, you can read to any address that is mapped, you can also write to any address that is mapped to a page with read-write areas.

In practice, the OS gives a process memory in chunks (pages) of normally 8K (but this is OS-dependant). The C library then manages these pages and maintains lists of what is free and what is allocated, giving the user addresses of these blocks when asked to with malloc.

So when you get a pointer back from malloc(), you are pointing to an area within an 8k page that is read-writable. This area may contain garbage, or it contain other malloc'd memory, it may contain the memory used for stack variables, or it may even contain the memory used by the C library to manage the lists of free/allocated memory!

So you can imagine that writing to addresses beyond the range you have malloc'ed can really cause problems:

  • Corruption of other malloc'ed data
  • Corruption of stack variables, or the call stack itself, causing crashes when a function return's
  • Corruption of the C-library's malloc/free management memory, causing crashes when malloc() or free() are called

All of which are a real pain to debug, because the crash usually occurs much later than when the corruption occurred.

Only when you read or write from/to the address which does not correspond to a mapped page will you get a crash... eg reading from address 0x0 (NULL)

Malloc, Free and pointers are very fragile in C (and to a slightly lesser degree in C++), and it is very easy to shoot yourself in the foot accidentally

There are many 3rd party tools for memory checking which wrap each memory allocation/free/access with checking code. They do tend to slow your program down, depending on how much checking is applied..

Think of memory as being a great big blackboard divided into little squares. Writing to a memory location is equivalent to erasing a square and writing a new value there. The purpose of malloc generally isn't to bring memory (blackboard squares) into existence; rather, it's to identify an area of memory (group of squares) that's not being used for anything else, and take some action to ensure that it won't be used for anything else until further notice. Historically, it was pretty common for microprocessors to expose all of the system's memory to an application. An piece of code Foo could in theory pick an arbitrary address and store its data there, but with a couple of major caveats:

  1. Some other code `Bar` might have previously stored something there with the expectation that it would remain. If `Bar` reads that location expecting to get back what it wrote, it will erroneously interpret the value written by `Foo` as its own. For example, if `Bar` had stored the number of widgets that were received (23), and `Foo` stored the value 57, the earlier code would then believe it had received 57 widgets.
  2. If `Foo` expects the data it writes to remain for any significant length of time, its data might get overwritten by some other code (basically the flip-side of the above).

Newer systems include more monitoring to keep track of what processes own what areas of memory, and kill off processes that access memory that they don't own. In many such systems, each process will often start with a small blackboard and, if attempts are made to malloc more squares than are available, processes can be given new chunks of blackboard area as needed. Nonetheless, there will often be some blackboard area available to each process which hasn't yet been reserved for any particular purposes. Code could in theory use such areas to store information without bothering to allocate it first, and such code would work if nothing happened to use the memory for any other purpose, but there would be no guarantee that such memory areas wouldn't be used for some other purpose at some unexpected time.

Usually malloc will allocate more memory than you require to for alignment purpose. Also because the process really have read/write access to the heap memory region. So reading a few bytes outside of the allocated region seldom trigger any errors.

But still you should not do it . Since the memory you're writing to can be regarded as unoccupied or is in fact occupied by others, anything can happen eg the 2nd and 3rd key/value pair will become garbage later or an irrelevant vital function will crash due to some invalid data you've stomped onto its malloc -ed memory.

(Also, either use char[≥4] as the type of key or malloc the key, because if the key is unfortunately stored on the stack it will become invalid later.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM