简体   繁体   English

在C中生成单词指针数组

[英]Generating array of word pointers in c

I have a problem where I have to read a text file made of 264064 words into a buffer and then create an array of word-pointers in a separate array. 我有一个问题,我必须将由264064个单词组成的文本文件读入缓冲区,然后在单独的数组中创建单词指针数组。 I am not sure how to go about creating the array of word-pointers which points to different amount of characters in the buffer. 我不确定如何创建指向缓冲区中不同数量字符的单词指针数组。 Any hints on how to approach this problem? 关于如何解决此问题的任何提示?

#include <stdlib.h>
#include <string.h>

int main()
{
    int i,wordCount=0;
    long bufsize;
    char ch;

    //Open File and get number of lines in file
    FILE *fp = fopen("words2.txt", "r");
    if (fp == NULL) {
        printf("Error!");
        exit(1);
    }
    do {
        ch = fgetc(fp);
        if (ch == '\n')
        {
            wordCount++;
        }

    } while (ch != EOF);
    fclose(fp);
    printf("%d\n",wordCount);

    //Reading Words into buffer rawtext
    char *rawtext;
    fp = fopen("words2.txt", "rb");

    if (fp != NULL)
    {
        if (fseek(fp, 0L, SEEK_END) == 0) {
            bufsize = ftell(fp);
            if (bufsize == -1) {
                exit(1);
            }
            rawtext = malloc(sizeof(char) * (bufsize + 1));

            if (fseek(fp, 0L, SEEK_SET) != 0) { exit(1); }

            size_t newLen = fread(rawtext, sizeof(char), bufsize, fp);
            if (ferror(fp) != 0) {
                fputs("Error reading file", stderr);
            } else {
                rawtext[newLen++] = '\0';
            }
        }
        //Print out buffer
        printf("%s",rawtext);
        fclose(fp);
        free(rawtext);//Free allocated memory

        char *ptr[wordCount];//Array for word-pointers
    }
}

If you keep your rawtext (ie do not free it), you could use strchr('\\n') to go through the content, store to the array the current position, detect every new line char, terminate the string at this new line character, and go ahead. 如果保留rawtext (即不释放它),则可以使用strchr('\\n')浏览内容,将当前位置存储到数组中,检测每个新行char,在该新行终止字符串角色,继续前进。 Thereby, your ptr -array will point to each word inside rawtext at the end (that's why you should not free rawtext then, because the pointers would then point to invalid memory): 因此,您的ptr rawtext在末尾指向rawtext中的每个单词(这就是为什么您不应该随后释放rawtext的原因,因为指针随后将指向无效的内存):

The following code should work: 下面的代码应该工作:

char* currWord = rawtext;
int nrOfWords = 0;
char* newlinePos;
while ((newlinePos = strchr(currWord,'\n')) != NULL) {
  *newlinePos = '\0';
  ptr[nrOfWords++] = currWord;
  currWord = newlinePos + 1;
}
if (*currWord) {
  ptr[nrOfWords++] = currWord;
}

Side note: expression char *ptr[wordCount] might put your pointer array on the stack, which has limited space, at least less than the heap. char *ptr[wordCount]说明:表达式char *ptr[wordCount]可能会将您的指针数组放在堆栈上,该堆栈的空间有限,至少小于堆。 This could get a problem if your file contains a lot of words. 如果您的文件包含很多单词,可能会出现问题。 Use char *ptr = malloc((wordCount+1) * sizeof(char*)) to reserve memory on the heap. 使用char *ptr = malloc((wordCount+1) * sizeof(char*))在堆上保留内存。 Note also the +1 after wordCount for the case that the last word is not terminated by a new line. 注意,如果最后一个单词没有被换行符终止,则wordCount之后的+1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM