如何在c的文本文件中重写该单词频率计数器程序？

Question

Here we see the program which computes how frequent each word of text file is present. 在这里，我们看到了一个程序，该程序计算文本文件中每个单词出现的频率。 After some small corrections it becomes to work perfectly for sufficiently small files. 经过一些小的更正后，它对于足够小的文件将变得完美工作。 I wanted to use it for large text file, but get an error "Segmentation fault". 我想将其用于大型文本文件，但出现错误“分段错误”。 The reason is that there is an initialization of array 原因是存在数组的初始化

 char p[1000][512],

which is so small for large text (if I understand correctly, it can save only 1000 words (which in general may coincide)). 这对于大文本来说太小了（如果我理解正确的话，它只能保存1000个单词（通常可能是一致的））。 If I try to enlarge the dimension of p, I also get this error (there cannot be arrays larger than 2000*2000 on my computer). 如果尝试扩大p的维数，也会出现此错误（计算机上的数组不能大于2000 * 2000）。

Could the code above be modified in order for opening large text files? 为了打开大型文本文件，可以修改上面的代码吗？ If yes, how to do that? 如果是，该怎么做？ Could You write the code which modifies it? 您可以编写修改它的代码吗？

Answer 1

Consider allocating your array on head using malloc . 考虑使用malloc在头上分配数组。

When you declare your array like char char p[1000][512] , it allocates 512 * 1000 (about 512 Kb) on stack. 当像char char p[1000][512]这样声明数组时，它将在堆栈上分配512 * 1000（约512 Kb）。 Stack size is insufficient for large files. 堆栈大小不足以容纳大文件。 When you allocate your memory using malloc, you ask operating system to give you some additional memory in heap. 使用malloc分配内存时，您要求操作系统在堆中提供一些额外的内存。

So, instead of your code you should do like 因此，除了您的代码外，您应该喜欢

typedef char * string_t;
string_t * stringsArray = malloc(sizeof(string_t) * NUM_STRINGS_TO_ALLOCATE);
for (size_t i = 0; i < STRINGS_COUNT; ++i)
   stringsArray[i] = malloc(sizeof(char) * NUM_CHARS_PER_STRING);

don't forget to free allocated memory after using it, like: 使用完内存后不要忘记free分配的内存，例如：

for (size_t i = 0; i < STRINGS_COUNT; ++i)
   free(stringsArray[i]);
free(stringsArray);

如何在c的文本文件中重写该单词频率计数器程序？

问题描述

1 个解决方案

解决方案1
2 2016-06-08 13:36:43

如何在c的文本文件中重写该单词频率计数器程序？

问题描述

1 个解决方案

解决方案1 2 2016-06-08 13:36:43

解决方案1
2 2016-06-08 13:36:43