[英]Why can I write and read memory when I haven't allocated space?
I'm trying to build my own Hash Table in C from scratch as an exercise and I'm doing one little step at a time. 我正在尝试从头开始以C语言构建自己的哈希表,作为一次练习,我一次只做了一个小步骤。 But I'm having a little issue...
但我有一个小问题...
I'm declaring the Hash Table structure as pointer so I can initialize it with the size I want and increase it's size whenever the load factor is high. 我将哈希表结构声明为指针,这样我就可以用所需的大小对其进行初始化,并在负载系数较高时增加它的大小。
The problem is that I'm creating a table with only 2 elements (it's just for testing purposes), I'm allocating memory for just those 2 elements but I'm still able to write to memory locations that I shouldn't. 问题是我正在创建一个仅包含2个元素的表(仅用于测试目的),我仅为这2个元素分配内存,但是我仍然能够写入不应该的内存位置。 And I also can read memory locations that I haven't written to.
而且我还可以读取未写入的内存位置。
Here's my current code: 这是我当前的代码:
#include <stdio.h>
#include <stdlib.h>
#define HASHSIZE 2
typedef char *HashKey;
typedef int HashValue;
typedef struct sHashTable {
HashKey key;
HashValue value;
} HashEntry;
typedef HashEntry *HashTable;
void hashInsert(HashTable table, HashKey key, HashValue value) {
}
void hashInitialize(HashTable *table, int tabSize) {
*table = malloc(sizeof(HashEntry) * tabSize);
if(!*table) {
perror("malloc");
exit(1);
}
(*table)[0].key = "ABC";
(*table)[0].value = 45;
(*table)[1].key = "XYZ";
(*table)[1].value = 82;
(*table)[2].key = "JKL";
(*table)[2].value = 13;
}
int main(void) {
HashTable t1 = NULL;
hashInitialize(&t1, HASHSIZE);
printf("PAIR(%d): %s, %d\n", 0, t1[0].key, t1[0].value);
printf("PAIR(%d): %s, %d\n", 1, t1[1].key, t1[1].value);
printf("PAIR(%d): %s, %d\n", 3, t1[2].key, t1[2].value);
printf("PAIR(%d): %s, %d\n", 3, t1[3].key, t1[3].value);
return 0;
}
You can easily see that I haven't allocated space for (*table)[2].key = "JKL";
您可以轻松地看到我没有为
(*table)[2].key = "JKL";
分配空间(*table)[2].key = "JKL";
nor (*table)[2].value = 13;
也
(*table)[2].value = 13;
. 。 I also shouldn't be able read the memory locations in the last 2
printfs
in main()
. 我也不应该在
main()
读取最后2个printfs
中的内存位置。
Can someone please explain this to me and if I can/should do anything about it? 有人可以向我解释一下,如果我可以/应该对此做任何事情?
EDIT: 编辑:
Ok, I've realized a few things about my code above, which is a mess... But I have a class right now and can't update my question. 好的,我已经意识到上面代码的一些问题,这很糟……但是我现在有一个类,无法更新我的问题。 I'll update this when I have the time.
有时间我会更新。 Sorry about that.
对于那个很抱歉。
EDIT 2: 编辑2:
I'm sorry, but I shouldn't have posted this question because I don't want my code like I posted above. 抱歉,我不应该发布此问题,因为我不想像上面发布的代码那样。 I want to do things slightly different which makes this question a bit irrelevant.
我想做的事情略有不同,这使这个问题变得无关紧要。 So, I'm just going to assume this was question that I needed an answer for and accept one of the correct answers below.
所以,我只是假设这是我需要一个答案的问题,并接受下面的正确答案之一。 I'll then post my proper questions...
然后,我会发布我的适当问题...
Just don't do it, it's undefined behavior. 只是不这样做,这是未定义的行为。
It might accidentially work because you write/read some memory the program doesn't actually use. 这可能会意外地起作用,因为您写入/读取了程序实际上未使用的内存。 Or it can lead to heap corruption because you overwrite metadata used by the heap manager for its purposes.
否则可能会导致堆损坏,因为您会覆盖堆管理器为此目的使用的元数据。 Or you can overwrite some other unrelated variable and then have hard times debugging the program that goes nuts because of that.
或者,您可以覆盖一些其他不相关的变量,然后很难调试因此而发疯的程序。 Or anything else harmful - either obvious or subtle yet severe - can happen.
否则可能发生任何其他有害的事情,无论是明显的还是微妙的但又很严重的。
Just don't do it - only read/write memory you legally allocated. 只是不这样做-只读/写您合法分配的内存。
Generally speaking (different implementation for different platforms) when a malloc or similar heap based allocation call is made, the underlying library translates it into a system call. 一般而言(针对不同平台的不同实现),当进行基于malloc或类似基于堆的分配调用时,底层库会将其转换为系统调用。 When the library does that, it generally allocates space in sets of regions - which would be equal or larger than the amount the program requested.
当库执行此操作时,通常会在区域集中分配空间-等于或大于程序请求的数量。
Such an arrangement is done so as to prevent frequent system calls to kernel for allocation, and satisfying program requests for Heap faster (This is certainly not the only reason!! - other reasons may exist as well). 这样做是为了防止频繁的系统调用内核进行分配,并防止更快地满足程序对Heap的请求(这当然不是唯一的原因!!!!!!!!!
Fall through of such an arrangement leads to the problem that you are observing. 落入这样的安排会导致您正在观察的问题。 Once again, its not always necessary that your program would be able to write to a non-allocated zone without crashing/seg-faulting everytime - that depends on particular binary's memory arrangement.
再一次,并非总是需要您的程序能够写入未分配的区域而不会每次都崩溃/ seg-fault-这取决于特定二进制文件的内存安排。 Try writing to even higher array offset - your program would eventually fault.
尝试写入更高的数组偏移量-程序最终将出错。
As for what you should/should-not do - people who have responded above have summarized fairly well. 至于您应该/不应该做的事情-做出上述答复的人们总结得很好。 I have no better answer except that such issues should be prevented and that can only be done by being careful while allocating memory.
我没有更好的答案,除了应该避免这种问题,而且只有在分配内存时要小心才能做到。
One way of understanding is through this crude example: When you request 1 byte in userspace, the kernel has to allocate a whole page atleast (which would be 4Kb on some Linux systems, for example - the most granular allocation at kernel level). 一种理解的方式是通过这个简单的示例:当您在用户空间中请求1个字节时,内核必须至少分配整个页面(例如,在某些Linux系统上为4Kb,这是内核级别上最细粒度的分配)。 To improve efficiency by reducing frequent calls, the kernel assigns this whole page to the calling Library - which the library can allocate as when more requests come in. Thus, writing or reading requests to such a region may not necessarily generate a fault.
为了通过减少频繁调用来提高效率,内核将整个页面分配给调用库-当有更多请求进入时,库可以分配该库。因此,将请求写入或读取到这样的区域可能不一定会产生错误。 It would just mean garbage.
那只是意味着垃圾。
In C, you can read to any address that is mapped, you can also write to any address that is mapped to a page with read-write areas. 在C中,您可以读取到任何映射的地址,也可以写入到任何映射到具有读写区域的页面的地址。
In practice, the OS gives a process memory in chunks (pages) of normally 8K (but this is OS-dependant). 在实践中,操作系统通常以8K的块(页面)形式提供进程内存(但这取决于操作系统)。 The C library then manages these pages and maintains lists of what is free and what is allocated, giving the user addresses of these blocks when asked to with malloc.
然后,C库管理这些页面,并维护空闲空间和已分配空间的列表,并在被malloc要求时提供这些块的用户地址。
So when you get a pointer back from malloc(), you are pointing to an area within an 8k page that is read-writable. 因此,当您从malloc()返回指针时,您指向的是8k页中可写的区域。 This area may contain garbage, or it contain other malloc'd memory, it may contain the memory used for stack variables, or it may even contain the memory used by the C library to manage the lists of free/allocated memory!
这个区域可能包含垃圾,或者包含其他已分配的内存,它可能包含用于堆栈变量的内存,甚至可能包含C库用来管理可用/已分配内存列表的内存!
So you can imagine that writing to addresses beyond the range you have malloc'ed can really cause problems: 因此,您可以想象,写入超出您已分配的范围的地址确实会导致问题:
All of which are a real pain to debug, because the crash usually occurs much later than when the corruption occurred. 所有这些都是调试的难题,因为崩溃通常比损坏发生的时间晚得多。
Only when you read or write from/to the address which does not correspond to a mapped page will you get a crash... eg reading from address 0x0 (NULL) 仅当您从不对应于映射页面的地址读取/写入该地址时,您才会崩溃...例如,从地址0x0(NULL)读取
Malloc, Free and pointers are very fragile in C (and to a slightly lesser degree in C++), and it is very easy to shoot yourself in the foot accidentally Malloc,Free和指针在C语言中非常脆弱(在C ++中则稍差一些),很容易意外地使自己陷入困境
There are many 3rd party tools for memory checking which wrap each memory allocation/free/access with checking code. 有很多用于内存检查的第三方工具,它们用检查代码来包装每个内存分配/空闲/访问。 They do tend to slow your program down, depending on how much checking is applied..
它们的确会使您的程序变慢,具体取决于应用了多少检查。
Think of memory as being a great big blackboard divided into little squares. 将记忆想象成一块巨大的大黑板,分成几个小方块。 Writing to a memory location is equivalent to erasing a square and writing a new value there.
写入存储位置等同于擦除正方形并在其中写入新值。 The purpose of
malloc
generally isn't to bring memory (blackboard squares) into existence; 通常,
malloc
的目的不是要使内存(黑板上的方块)存在。 rather, it's to identify an area of memory (group of squares) that's not being used for anything else, and take some action to ensure that it won't be used for anything else until further notice. 相反,它是要识别未用于其他任何用途的内存区域(一组正方形),并采取一些措施以确保除非另行通知,否则不会将其用于其他任何用途。 Historically, it was pretty common for microprocessors to expose all of the system's memory to an application.
从历史上看,微处理器将系统的所有内存公开给应用程序是很普遍的。 An piece of code
Foo
could in theory pick an arbitrary address and store its data there, but with a couple of major caveats: 从理论上讲,一段代码
Foo
可以选择一个任意地址并将其数据存储在其中,但有两个主要警告:
Newer systems include more monitoring to keep track of what processes own what areas of memory, and kill off processes that access memory that they don't own. 较新的系统包括更多监视功能,以跟踪哪些进程拥有哪些内存区域,并杀死访问它们不拥有的内存的进程。 In many such systems, each process will often start with a small blackboard and, if attempts are made to
malloc
more squares than are available, processes can be given new chunks of blackboard area as needed. 在许多这样的系统中,每个进程往往会先小黑板,如果试图以
malloc
多平方比可用,过程可根据需要给予黑板区域的新块。 Nonetheless, there will often be some blackboard area available to each process which hasn't yet been reserved for any particular purposes. 但是,每个过程通常会有一些黑板区域可供使用,而这些区域尚未保留用于任何特定目的。 Code could in theory use such areas to store information without bothering to allocate it first, and such code would work if nothing happened to use the memory for any other purpose, but there would be no guarantee that such memory areas wouldn't be used for some other purpose at some unexpected time.
理论上,代码可以使用这样的区域来存储信息而不必费心先分配信息,并且如果没有任何事情将内存用于其他任何目的,则这样的代码可以工作,但是不能保证不会将此类内存区域用于在一些意想不到的时间达到其他目的。
Usually malloc
will allocate more memory than you require to for alignment purpose. 通常,
malloc
将分配比您所需的更多内存以用于对齐目的。 Also because the process really have read/write access to the heap memory region. 同样是因为该进程确实具有对堆内存区域的读/写访问权限。 So reading a few bytes outside of the allocated region seldom trigger any errors.
因此,读取分配区域之外的几个字节很少会触发任何错误。
But still you should not do it . 但是您仍然不应该这样做 。 Since the memory you're writing to can be regarded as unoccupied or is in fact occupied by others, anything can happen eg the 2nd and 3rd key/value pair will become garbage later or an irrelevant vital function will crash due to some invalid data you've stomped onto its
malloc
-ed memory. 由于您正在写入的内存可能被视为未占用或实际上已被其他人占用,因此任何事情都可能发生,例如第二和第三键/值对稍后会变成垃圾,或者不相关的重要功能会由于您的一些无效数据而崩溃。 “已经踩踏到它
malloc
-ed存储器。
(Also, either use char[≥4]
as the type of key or malloc
the key, because if the key is unfortunately stored on the stack it will become invalid later.) (另外,请使用
char[≥4]
作为键的类型或malloc
键,因为如果不幸的是,该键存储在堆栈中,则以后将变得无效。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.