简体繁体 English

使用malloc和realloc进行动态存储的最佳方法

[英]optimal way of using malloc and realloc for dynamic storing

原文 2017-04-21 08:17:07 3 2 c

I'm trying to figure out what is the optimal way of using malloc and realloc for recieving unknown amount of characters from the user ,storing them, and printing them only by the end. 我试图弄清楚使用malloc和realloc从用户那里收集未知数量的字符，存储它们以及仅在结束时打印它们的最佳方法。

I've figured that calling realloc too many times wont be so smart. 我认为调用realloc太多次都不会那么聪明。 so instead, I allocate a set amount of space each time,lets say sizeof char*100 and by the end of file,i use realloc to fit the size of the whole thing precisely. 所以相反，我每次都分配一定量的空间，让我们说sizeof char * 100并在文件的末尾，我使用realloc来精确地拟合整个事物的大小。

what do you think?is this a good way to go about? 你觉得怎么样？这是一个好方法吗？ would you go in a different path? 你会走另一条路吗？

Please note,I have no intention of using linked lists,getchar(),putchar(). 请注意，我无意使用链表，getchar（），putchar（）。 using malloc and realloc only is a must. 仅使用malloc和realloc是必须的。

2 个解决方案

If you realloc to fit the exact amount of data needed, then you are optimizing for memory consumption. 如果重新分配以适应所需的确切数据量，那么您将优化内存消耗。 This will likely give slower code because 1) you get extra realloc calls and 2) you might not allocate amounts that fit well with CPU alignment and data cache. 这可能会导致代码变慢，因为1）您获得额外的realloc调用; 2）您可能无法分配适合CPU对齐和数据缓存的数量。 Possibly this also causes heap segmentation issues because of the repeated reallocs, in which case it could actually waste memory. 可能这也会导致堆分段问题，因为重复的reallocs，在这种情况下它实际上可能浪费内存。

It's hard to answer what's "best" generically, but the below method is fairly common, as it is a good compromise between reducing execution speed for realloc calls and lowering memory use: 通常很难回答什么是“最佳”，但下面的方法相当常见，因为它是降低realloc调用的执行速度和降低内存使用之间的良好折衷：

You allocate a segment, then keep track of how much of this segment that is user data. 您分配一个段，然后跟踪该段的用户数据量。 It is a good idea to allocate size_t mempool_size = n * _Alignof(int); 分配size_t mempool_size = n * _Alignof(int);是个好主意size_t mempool_size = n * _Alignof(int); bytes and it is probably also wise to use a n which is divisible by 8. 使用可被8整除的n也可能是明智的。

Each time you run out of free memory in this segment, you realloc to mempool_size*2 bytes. 每次在此段中耗尽可用内存时，都会重新分配到mempool_size*2个字节。 That way you keep doubling the available memory each time. 这样你每次都可以将可用内存增加一倍。

I've figured that calling realloc too many times wont be so smart. 我认为调用realloc太多次都不会那么聪明。

How have you figured it out? 你是怎么想出来的？ Because the only way to really know is to measure the performance. 因为真正了解的唯一方法是衡量绩效。

Your strategy may need to differ based on how you are reading the data from the user. 您的策略可能需要根据您从用户读取数据的方式而有所不同。 If you are using getchar() you probably don't want to use realloc() to increase the buffer size by one char each time you read a character. 如果您使用的是getchar()那么每次读取一个字符时，您可能不希望使用realloc()将缓冲区大小增加一个char。 However, a good realloc() will be much less inefficient than you think even in these circumstances. 但是，即使在这些情况下，一个好的realloc()也会比你想象的低效率低得多。 The minimum block size that glibc will actually give you in response to a malloc() is, I think, 16 bytes. 我认为，glibc实际上为了响应malloc()而给出的最小块大小是16个字节。 So going from 0 to 16 characters and reallocing each time doesn't involve any copying. 因此，从0到16个字符并且每次重新分配不涉及任何复制。 Similarly for larger reallocations, a new block might not need to be allocated, it may be possible to make the existing block bigger. 类似地，对于较大的重新分配，可能不需要分配新块，可以使现有块更大。 Don't forget that even at its slowest, realloc() will be faster than a person can type. 不要忘记，即使是最慢的， realloc()也会比人们输入的更快。

Most people don't go for that strategy. 大多数人不会采取这种策略。 What can by typed can be piped so the argument that people don't type very fast doesn't necessarily work. 输入的内容可以通过管道传输，因此人们不能快速打字的论点不一定有效。 Normally, you introduce the concept of capacity. 通常，您会介绍容量的概念。 You allocate a buffer with a certain capacity and when it gets full, you increase its capacity (with realloc() ) by adding a new chunk of a certain size. 您分配具有特定容量的缓冲区，当它满了时，您可以通过添加特定大小的新块来增加其容量（使用realloc() ）。 The initial size and the reallocation size can be tuned in various ways. 初始大小和重新分配大小可以通过各种方式进行调整。 If you are reading user input, you might go for small values eg 256 bytes, if you are reading files off disk or across the network, you might go for larger values eg 4Kb or bigger. 如果您正在读取用户输入，则可能需要较小的值，例如256个字节，如果您正在从磁盘或网络上读取文件，则可能需要更大的值，例如4Kb或更大。

The increment size doesn't even need to be constant, you could choose to double the size for each needed reallocation. 增量大小甚至不需要是常量，您可以选择将每个所需重新分配的大小加倍。 This is the strategy used by some programming libraries. 这是一些编程库使用的策略。 For example the Java implementation of a hash table uses it I believe and so possibly does the Cocoa implementation of an array. 例如，我相信哈希表的Java实现使用它，因此可能是数组的Cocoa实现。

It's impossible to know beforehand what the best strategy in any particular situation is. 事先不可能知道在任何特定情况下最佳策略是什么。 I would pick something that feels right and then, if the application has performance issues, I would do testing to tune it. 我会选择一些感觉正确的东西然后，如果应用程序有性能问题，我会做测试来调整它。 Your code doesn't have to be the fastest possible, but only fast enough. 您的代码不一定是最快的，但速度足够快。

However one thing I absolutely would not do is overlay a home rolled memory algorithm over the top of the built in allocator. 然而，我绝对不会做的一件事是在内置分配器的顶部覆盖一个家庭滚动内存算法。 If you find yourself maintaining a list of blocks you are not using instead of freeing them, you are doing it wrong. 如果你发现自己维护了一个你没有使用的块列表而不是释放它们，那么你做错了。 This is what got OpenSSL into trouble. 这就是让OpenSSL陷入困境的原因。