简体   繁体   中英

Reading text file using fgets() and strtok() to separate strings in line yielding unwanted behaviour

I am trying to read a text file with the following format, using fgets() and strtok().

1082018 1200 79 Meeting with President
2012018 1200 79 Meet with John at cinema
2082018 1400 30 games with Alpha
3022018 1200 79 sports

I need to separate the first value from the rest of the line, for example:

key=21122019, val = 1200 79 Meeting with President

To do so I am using strchr() for val and strtok() for key , however, the key value remains unchanged when reading from file. I can't understand why this is happening since I am allocating space for in_key inside the while loop and placing inside an array at a different index each time.

My code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define N 1000 // max number of lines to be read
#define VALLEN 100
#define MAXC 1024

#define ALLOCSIZE 1000 /*size of available space*/
static char allocbuf[ALLOCSIZE]; /* storage for alloc*/
static char *allocp = allocbuf; /* next free position*/

char *alloc(int n) { /* return a pointer to n characters*/
    if (allocbuf + ALLOCSIZE - allocp >= n) { /*it fits*/
        allocp += n;
        return allocp - n; /*old p*/
    } else /*not enough room*/
        return 0;
}

int main(int argc, char** argv) {
    FILE *inp_cal;
    inp_cal = fopen("calendar.txt", "r+");

    char buf[MAXC];
    char *line[1024];
    char *p_line;

    char *in_val_arr[100];
    char *in_key_arr[100];
    int count = 0;
    char delimiter[] = " ";

    if (inp_cal) {
        printf("Processing file...\n");
        while (fgets(buf, MAXC, inp_cal)) {
            p_line = malloc(strlen(buf) + 1); // malloced with size of buffer.
            char *in_val;
            char *in_key;

            strcpy(p_line, buf);    //used to create a copy of input buffer
            line[count] = p_line;

            /* separating the line based on the first space. The words after
             * the delimeter will be copied into in_val */
            char *copy = strchr(p_line, ' ');
            if (copy) {
                if ((in_val = alloc(strlen(line[count]) + 1)) == NULL) {
                    return -1;
                } else {
                    strcpy(in_val, copy + 1);
                    printf("arr: %s", in_val);
                    in_val_arr[count] = in_val;
                }
            } else
                printf("Could not find a space\n");

            /* We now need to get the first word from the input buffer*/
            if ((in_key = alloc(strlen(line[count]) + 1)) == NULL) {
                return -1;
            }
            else {
                in_key = strtok(buf, delimiter);
                printf("%s\n", in_key);
                in_key_arr[count] = in_key; // <-- Printed out well
                count++;
            }
        }
        for (int i = 0; i < count; ++i)
            printf("key=%s, val = %s", in_key_arr[i], in_val_arr[i]); //<-- in_key_arr[i] contains same values throughout, unlike above
        fclose(inp_cal);
    }
    return 0;
}

while-loop output (correct):

Processing file...
arr: 1200 79 Meeting with President
1082018
arr: 1200 79 Meet with John at cinema
2012018
arr: 1400 30 games with Alpha
2082018
arr: 1200 79 sports
3022018

for-loop output (incorrect):

key=21122019, val = 1200 79 Meeting with President
key=21122019, val = 1200 79 Meet with John
key=21122019, val = 1400 30 games with Alpha
key=21122019, val = 1200 79 sports

Any suggestions on how this can be improved and why this is happening? Thanks

Continuing for the comment, in attempting to use strtok to separate your data into key, val, somenum and the remainder of the line as a string, you are making things harder than it need be.

If the beginning of your lines are always:

key val somenum rest

you can simply use sscanf to parse key, val and somenum into, eg three unsigned values and the rest of the line into a string. To help preserve the relationship between each key, val, somenum and string , storing the values from each line in a struct is greatly ease keeping track of everything. You can even allocate for the string to minimize storage to the exact amount required. For example, you could use something like the following:

typedef struct {    /* struct to handle values */
    unsigned key, val, n;
    char *s;
} keyval_t;

Then within main() you could allocate for some initial number of struct, keep an index as a counter, loop reading each line using a temporary stuct and buffer, then allocating for the string ( +1 for the nul-terminating character) and copying the values to your struct. When the number of structs filled reaches your allocated amount, simply realloc the number of structs and keep going.

For example, let's say you initially allocate for NSTRUCT struts and read each line into buf , eg

...
#define NSTRUCT    8    /* initial struct to allocate */
#define MAXC    1024    /* read buffer size (don't skimp) */
...
    /* allocate/validate storage for max struct */
    if (!(kv = malloc (max * sizeof *kv))) {
        perror ("malloc-kv");
        return 1;
    }
    ...
    size_t ndx = 0,         /* used */
        max = NSTRUCT;      /* allocated */
    keyval_t *kv = NULL;    /* ptr to struct */
    ...
    while (fgets (buf, MAXC, fp)) { /* read each line of input */
    ...

Within your while loop, you simply need to parse the values with sscanf , eg

        char str[MAXC];
        size_t len;
        keyval_t tmp = {.key = 0};  /* temporary struct for parsing */
        if (sscanf (buf, "%u %u %u %1023[^\n]", &tmp.key, &tmp.val, &tmp.n,
            str) != 4) {
            fprintf (stderr, "error: invalid format, line '%zu'.\n", ndx);
            continue;
        }

With the values parsed, you check whether your index has reached the number of struct you have allocated and realloc if required (note the use of a temporary pointer to realloc ), eg

        if (ndx == max) {    /* check if realloc needed */
            /* always realloc with temporary pointer */
            void *kvtmp = realloc (kv, 2 * max * sizeof *kv);
            if (!kvtmp) {
                perror ("realloc-kv");
                break;  /* don't exit, kv memory still valid */
            }
            kv = kvtmp; /* assign new block to pointer */
            max *= 2;   /* increment max allocated */
        }

Now with storage for the struct , simply get the length of the string, copy the unsigned values to your struct, and allocate length + 1 chars for kv[ndx].s and copy str to kv[ndx].s , eg

        len = strlen(str);              /* get length of str */
        kv[ndx] = tmp;                  /* assign tmp values to kv[ndx] */
        kv[ndx].s = malloc (len + 1);   /* allocate block for str */
        if (!kv[ndx].s) {               /* validate */
            perror ("malloc-kv[ndx].s");
            break;  /* ditto */
        }
        memcpy (kv[ndx++].s, str, len + 1); /* copy str to kv[ndx].s */
    }

( note: you can use strdup if you have it to replace malloc through memcpy with kv[ndx].s = strdup (str); , but since strdup allocates, don't forget to check kv[ndx].s != NULL before incrementing ndx if you go that route)

That's pretty much the easy and robust way to capture your data. It is now contained in an allocated array of struct which you can use as needed, eg

    for (size_t i = 0; i < ndx; i++) {
        printf ("kv[%2zu] : %8u  %4u  %2u  %s\n", i,
            kv[i].key, kv[i].val, kv[i].n, kv[i].s);
        free (kv[i].s);     /* free string */
    }

    free (kv);  /* free stucts */

(don't forget to free the memory you allocate)

Putting it altogether, you could do something like the following:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NSTRUCT    8    /* initial struct to allocate */
#define MAXC    1024    /* read buffer size (don't skimp) */

typedef struct {    /* struct to handle values */
    unsigned key, val, n;
    char *s;
} keyval_t;

int main (int argc, char **argv) {

    char buf[MAXC];         /* line buffer */
    size_t ndx = 0,         /* used */
        max = NSTRUCT;      /* allocated */
    keyval_t *kv = NULL;    /* ptr to struct */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("fopen-file");
        return 1;
    }

    /* allocate/validate storage for max struct */
    if (!(kv = malloc (max * sizeof *kv))) {
        perror ("malloc-kv");
        return 1;
    }

    while (fgets (buf, MAXC, fp)) { /* read each line of input */
        char str[MAXC];
        size_t len;
        keyval_t tmp = {.key = 0};  /* temporary struct for parsing */
        if (sscanf (buf, "%u %u %u %1023[^\n]", &tmp.key, &tmp.val, &tmp.n,
            str) != 4) {
            fprintf (stderr, "error: invalid format, line '%zu'.\n", ndx);
            continue;
        }
        if (ndx == max) {    /* check if realloc needed */
            /* always realloc with temporary pointer */
            void *kvtmp = realloc (kv, 2 * max * sizeof *kv);
            if (!kvtmp) {
                perror ("realloc-kv");
                break;  /* don't exit, kv memory still valid */
            }
            kv = kvtmp; /* assign new block to pointer */
            max *= 2;   /* increment max allocated */
        }
        len = strlen(str);              /* get length of str */
        kv[ndx] = tmp;                  /* assign tmp values to kv[ndx] */
        kv[ndx].s = malloc (len + 1);   /* allocate block for str */
        if (!kv[ndx].s) {               /* validate */
            perror ("malloc-kv[ndx].s");
            break;  /* ditto */
        }
        memcpy (kv[ndx++].s, str, len + 1); /* copy str to kv[ndx].s */
    }

    if (fp != stdin)    /* close file if not stdin */
        fclose (fp);

    for (size_t i = 0; i < ndx; i++) {
        printf ("kv[%2zu] : %8u  %4u  %2u  %s\n", i,
            kv[i].key, kv[i].val, kv[i].n, kv[i].s);
        free (kv[i].s);     /* free string */
    }

    free (kv);  /* free stucts */
}

Example Use/Output

Using your data file as input, you would receive the following:

$ ./bin/fgets_sscanf_keyval <dat/keyval.txt
kv[ 0] :  1082018  1200  79  Meeting with President
kv[ 1] :  2012018  1200  79  Meet with John at cinema
kv[ 2] :  2082018  1400  30  games with Alpha
kv[ 3] :  3022018  1200  79  sports

Memory Use/Error Check

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.

For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

$ valgrind ./bin/fgets_sscanf_keyval <dat/keyval.txt
==6703== Memcheck, a memory error detector
==6703== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6703== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==6703== Command: ./bin/fgets_sscanf_keyval
==6703==
kv[ 0] :  1082018  1200  79  Meeting with President
kv[ 1] :  2012018  1200  79  Meet with John at cinema
kv[ 2] :  2082018  1400  30  games with Alpha
kv[ 3] :  3022018  1200  79  sports
==6703==
==6703== HEAP SUMMARY:
==6703==     in use at exit: 0 bytes in 0 blocks
==6703==   total heap usage: 5 allocs, 5 frees, 264 bytes allocated
==6703==
==6703== All heap blocks were freed -- no leaks are possible
==6703==
==6703== For counts of detected and suppressed errors, rerun with: -v
==6703== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors.

Look things over and let me now if you have any further questions. If you need to further split kv[i].s , then you can think about using strtok .

You are storing the same pointer in the in_key_arr over and over again.

You roughly need this:

in_key = strtok(buf, delimiter);
printf("%s\n", in_key);
char *newkey = malloc(strlen(in_key) + 1);  // <<<< allocate new memory
strcpy(newkey, in_key);
in_key_arr[count] = newkey;                 // <<<< store newkey
count++;

Disclaimer:

  • no error checking is done for brevity
  • the malloced memory needs to be freed once you're done with it.

you are assigning an address with the call to alloc then reassigning with call to strtok? rewriting the same address? Copy return from strtok to in_key?

       char *copy = strchr(p_line, ' ');
        if (copy) {
            if ((in_val = alloc(strlen(line[count]) + 1)) == NULL) {
                return -1;
            } else {
                printf("arr: %ul\n", in_val);
                strcpy(in_val, copy + 1);
                printf("arr: %s", in_val);
                in_val_arr[count] = in_val;
            }
        } else
            printf("Could not find a space\n");

        /* We now need to get the first word from the input buffer*/
        if ((in_key = alloc(strlen(line[count]) + 1)) == NULL) {
            return -1;
        }
        else {

            printf("key: %ul\n", in_key);
            in_key = strtok(buf, delimiter);
            printf("key:\%ul %s\n",in_key, in_key);
            in_key_arr[count++] = in_key; // <-- Printed out well
        }

output:

allocbuf: 1433760064l
Processing file...
all: 1433760064l
arr: 1433760064l
arr: 1200 79 Meeting with President
all: 1433760104l
key: 1433760104l
key:4294956352l 1082018

this change fixed it:

strcpy(in_key, strtok(buf, delimiter));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM