简体   繁体   中英

C - Unexpected random characters being read from end of file

I'm trying to read in a list of comma separated words from a csv file, and I'm having trouble dealing with the seamingly random characters that appear at the end of the ile when read in by C. The characters at the end of the file seem to change completely when I add/remove words from the list.

This is what is contained in the file: johnny,david,alan,rodney,bob,ronald,andrew,hola,goodbye . That is copied exactly, there is no accidental space or carriage return at the end.

Here is what gets read in by the program:

This is the code that is reading in the text:

    char* name;
    FILE *fp;
    char *fcontent;
    int wordCount = 0;
    char delim = ',';
    long fsize;
    bool end = false;
    char guessedLetters[26];
    int guessNum = 0;
    int lives = 0;

    for (int i = 0; i < 26; i++) {
        guessedLetters[i] = '\0';
    }

    fp = fopen(WORDS_FILENAME, "r");

    if (fp == NULL) {
        printf("Words File Exception: Exiting.");
        return 1;
    }

    fseek(fp, 0L, SEEK_END);
    fsize = ftell(fp);
    fseek(fp, 0L, SEEK_SET);

    fcontent = (char*)calloc(fsize, sizeof(char));

    if (fcontent == NULL) {
        printf("No words in file: Exiting.");
        return 1;
    }

    fread(fcontent, sizeof(char), fsize, fp);
    char *fcontent2 = malloc(strlen(fcontent + 1));
    strcpy(fcontent2, fcontent);
    fclose(fp);

The words are being split down into an array of words, and the rogue characters are being kept appended onto the end of the last words, causing quite a lot of problems later on in the program.

This is the code splitting the string into the array wordArr :

char wordArr[wordCount][15];

    char *ptr2 = strtok(fcontent2, &delim);
    int count = 0;

    while (ptr2 != NULL) {
        strcpy(wordArr[count], ptr2);
        count++;
        ptr2 = strtok(NULL, &delim);
    }

Perhaps if it isn't possible to completely omit the characters from being read, they could be omitted in the splitting process?

Thanks, Jack.

First, you open the file in text mode:

fp = fopen(WORDS_FILENAME, "r");

Per the C standard 7.21.9.4 The ftell function , paragraph 2 :

The ftell function obtains the current value of the file position indicator for the stream pointed to by stream. For a binary stream, the value is the number of characters from the beginning of the file. For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.

You can't use ftell() on a text stream to tell how many bytes might be read.

So you'd have to open the file in binary mode to use ftell() (but see note below):

fp = fopen(WORDS_FILENAME, "rb");

Now you have the file size:

fseek(fp, 0L, SEEK_END);
fsize = ftell(fp);
fseek(fp, 0L, SEEK_SET);

fcontent = (char*)calloc(fsize, sizeof(char));

But , that leaves no room for any '\\0' terminator, so that should be

// no need to cast a void * in C, and sizeof(char)
// is **always** one by definition
fcontent = calloc(fsize + 1 , 1);

Now you'll have a terminated string for the file contents.

Note about fseek() on a binary stream

Using fseek() to reach the end of a binary stream is literally undefined behavior per the C standard.

Per 7.21.9.2 The fseek function , paragraph 3 :

For a binary stream, the new position, measured in characters from the beginning of the file, is obtained by adding offset to the position specified by whence. The specified position is the beginning of the file if whence is SEEK_SET, the current value of the file position indicator if SEEK_CUR, or end-of-file if SEEK_END. A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.

Footnote 268 even states:

Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state.

The only reason you can use fseek(fp, 0L, SEEK_END); is because most operating systems extend the C language and actually define that to work.

The readed data doesn't contain the termination null character.

You need to check count of readed characters, then "manually" set the termination null character:

int cnt = fread(fcontent, sizeof(char), fsize, fp);
fcontent[cnt] = '\0';

Of course, the good practice is to check that cnt is not negative (read error), before use it as array index.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM