简体   繁体   中英

Reading a stream of values from text file in C

I have a text file which may contain one or up to 400 numbers. Each number is separated by a comma and a semicolon is used to indicate end of numbers stream. At the moment I am reading the text file line by line using the fgets. For this reason I am using a fixed array of 1024 elements (the maximum characters per line for a text file). This is not the ideal way how to implement this since if only one number is inputted in the text file, an array of 1024 elements will we pointless. Is there a way to use fgets with the malloc function (or any other method) to increase memory efficiency?

If you are looking into using this in a production code then I would request you to follow the suggestions put in the comments section.

But if you requirement is more for learning or school, then here is a complex approach.

Pseudo code

1. Find the size of the file in bytes, you can use "stat" for this.
2. Since the file format is known, from the file size, calculate the number of items.
3. Use the number of items to malloc.

Voila! :p

How to find file size

You can use stat as shown below:

#include <sys/stat.h>
#include <stdio.h>

int main(void)
{
    struct stat st;

    if (stat("file", &st) == 0) {
        printf("fileSize: %d  No. of Items: %d\n", (st.st_size), (st.st_size/2));
        return st.st_size;
    }

    printf("failed!\n");
    return 0;
}

This file when run will return the file size:

$> cat file
1;
$> ./a.out
fileSize: 3  No. of Items: 1

$> cat file
1,2,3;
$> ./a.out
fileSize: 7  No. of Items: 3

Disclaimer : Is this approach to minimize the pre-allocated memory an optimal approach? No ways in heaven! :)

Dynamically allocating space for you data is a fundamental tool for working in C. You might as well pay the price to learn. The primary thing to remember is,

"if you allocate memory, you have the responsibility to track its use and preserve a pointer to the starting address for the block of memory so you can free it when you are done with it. Otherwise your code with leak memory like a sieve."

Dynamic allocation is straight forward. You allocate some initial block of memory and keep track of what you add to it. You must test that each allocation succeeds. You must test how much of the block of memory you use and reallocate or stop writing data when full to prevent writing beyond the end of your block of memory. If you fail to test either, you will corrupt the memory associated with your code.

When you reallocate, always reallocate using a temporary pointer because with a reallocation failure, the original block of memory is freed. (causing loss of all previous data in that block). Using a temporary pointer allows you to handle failure in a manner to preserve that block if needed.

Taking that into consideration, below we initially allocate space for 64 long values (you can easily change to code to handle any type, eg int , float , double ...). The code then reads each line of data (using getline to dynamically allocate the buffer for each line). strtol is used to parse the buffer assigning values to the array . idx is used as an index to keep track of how many values have been read, and when idx reaches the current nmax , array is reallocated twice as large as it previously was and nmax is updated to reflect the change. The reading, parsing, checking and reallocating continues for every line of data in the file. When done, the values are printed to stdout, showing the 400 random values read from the test file formatted as 353,394,257,...293,58,135;

To keep the read loop logic clean, I've put the error checking for the strtol conversion into a function xstrtol , but you are free to include that code in main() if you like. The same applies to the realloc_long function. To see when the reallocation takes place, you can compile the code with the -DDEBUG definition. Eg:

gcc -Wall -Wextra -DDEBUG -o progname yoursourcefile.c

The program expects your data filename as the first argument and you can provide an optional conversion base as the second argument (default is 10). Eg:

./progname datafile.txt [base (default: 10)]

Look over it, test it, and let me know if you have any questions.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <errno.h>

#define NMAX 64

long xstrtol (char *p, char **ep, int base);
long *realloc_long (long *lp, unsigned long *n);

int main (int argc, char **argv)
{

    char *ln = NULL;                /* NULL forces getline to allocate  */
    size_t n = 0;                   /* max chars to read (0 - no limit) */
    ssize_t nchr = 0;               /* number of chars actually read    */
    size_t idx = 0;                 /* array index counter              */
    long *array = NULL;             /* pointer to long                  */
    unsigned long nmax = NMAX;      /* initial reallocation counter     */
    FILE *fp = NULL;                /* input file pointer               */
    int base = argc > 2 ? atoi (argv[2]) : 10; /* base (default: 10)    */

    /* open / validate file */
    if (!(fp = fopen (argv[1], "r"))) {
        fprintf (stderr, "error: file open failed '%s'.", argv[1]);
        return 1;
    }

    /* allocate array of NMAX long using calloc to initialize to 0 */
    if (!(array = calloc (NMAX, sizeof *array))) {
        fprintf (stderr, "error: memory allocation failed.");
        return 1;
    }

    /* read each line from file - separate into array       */
    while ((nchr = getline (&ln, &n, fp)) != -1)
    {
        char *p = ln;      /* pointer to ln read by getline */ 
        char *ep = NULL;   /* endpointer for strtol         */

        while (errno == 0)
        {   /* parse/convert each number in line into array */
            array[idx++] = xstrtol (p, &ep, base);

            if (idx == nmax)        /* check NMAX / realloc */
                array = realloc_long (array, &nmax);

            /* skip delimiters/move pointer to next digit   */
            while (*ep && *ep != '-' && (*ep < '0' || *ep > '9')) ep++;
            if (*ep)
                p = ep;
            else
                break;
        }
    }

    if (ln) free (ln);              /* free memory allocated by getline */
    if (fp) fclose (fp);            /* close open file descriptor       */

    int i = 0;
    for (i = 0; i < idx; i++)
        printf (" array[%d] : %ld\n", i, array[i]);

    free (array);

    return 0;
}

/* reallocate long pointer memory */
long *realloc_long (long *lp, unsigned long *n)
{
    long *tmp = realloc (lp, 2 * *n * sizeof *lp);
#ifdef DEBUG
    printf ("\n  reallocating %lu to %lu\n", *n, *n * 2);
#endif
    if (!tmp) {
        fprintf (stderr, "%s() error: reallocation failed.\n", __func__);
        // return NULL;
        exit (EXIT_FAILURE);
    }
    lp = tmp;
    memset (lp + *n, 0, *n * sizeof *lp); /* memset new ptrs 0 */
    *n *= 2;

    return lp;
}

long xstrtol (char *p, char **ep, int base)
{
    errno = 0;

    long tmp = strtol (p, ep, base);

    /* Check for various possible errors */
    if ((errno == ERANGE && (tmp == LONG_MIN || tmp == LONG_MAX)) ||
        (errno != 0 && tmp == 0)) {
        perror ("strtol");
        exit (EXIT_FAILURE);
    }

    if (*ep == p) {
        fprintf (stderr, "No digits were found\n");
        exit (EXIT_FAILURE);
    }

    return tmp;
}

Sample Output (with -DDEBUG to show reallocation)

$ ./bin/read_long_csv dat/randlong.txt

  reallocating 64 to 128

  reallocating 128 to 256

  reallocating 256 to 512
 array[0] : 353
 array[1] : 394
 array[2] : 257
 array[3] : 173
 array[4] : 389
 array[5] : 332
 array[6] : 338
 array[7] : 293
 array[8] : 58
 array[9] : 135
<snip>
 array[395] : 146
 array[396] : 324
 array[397] : 424
 array[398] : 365
 array[399] : 205

Memory Error Check

$ valgrind ./bin/read_long_csv dat/randlong.txt
==26142== Memcheck, a memory error detector
==26142== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==26142== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==26142== Command: ./bin/read_long_csv dat/randlong.txt
==26142==

  reallocating 64 to 128

  reallocating 128 to 256

  reallocating 256 to 512
 array[0] : 353
 array[1] : 394
 array[2] : 257
 array[3] : 173
 array[4] : 389
 array[5] : 332
 array[6] : 338
 array[7] : 293
 array[8] : 58
 array[9] : 135
<snip>
 array[395] : 146
 array[396] : 324
 array[397] : 424
 array[398] : 365
 array[399] : 205
==26142==
==26142== HEAP SUMMARY:
==26142==     in use at exit: 0 bytes in 0 blocks
==26142==   total heap usage: 7 allocs, 7 frees, 9,886 bytes allocated
==26142==
==26142== All heap blocks were freed -- no leaks are possible
==26142==
==26142== For counts of detected and suppressed errors, rerun with: -v
==26142== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM