简体   繁体   中英

How to read a text file and store in an array in C

The script successfully prints the text file however I want to store what is in the text file into an array, I have looked a lot of places but I am not exactly understanding what information I have come across, is there anyway I can get some guidance?

#include <stdlib.h>

int main() 
{
       
// OPENS THE FILE
    FILE *fp = fopen("/classes/cs3304/cs330432/Programs/StringerTest/people.txt", "r");

    size_t len = 1000;
    char *word = malloc(sizeof(char) * len);
// CHECKS IF THE FILE EXISTS, IF IT DOESN'T IT WILL PRINT OUT A STATEMENT SAYING SO     
    if (fp == NULL) 
    {
        printf("file not found");
        return 0;
    }
    while(fgets(word, len, fp) != NULL) 
    {
        printf("%s", word);
    }
    free(word);
    
}

the text file has the following in it(just a list of words):

endorse
vertical
glove
legend
scenario
kinship
volunteer
scrap
range
elect
release
sweet
company
solve
elapse
arrest
witch
invasion
disclose
professor
plaintiff
definition
bow
chauvinist

Let's see if we can't get you straightened out. First, you are thinking in the right direction, and you should be commended for using fgets() to read each line into a fixed buffer (character array), and then you need to collect and store all of the lines so that they are available for use by your program -- that appears to be where the wheels fell off.

Basic Outline of Approach

In an overview, when you want to handle an unlimited number of lines, you have two different types of blocks of memory you are going to allocate and manage. The first is a block of memory you allocate that will hold some number of pointers (one for each line you will store). It doesn't matter how many you initially allocate, because you will keep track of the number allocated (number available) and the number used. When (used == available) you will realloc() a bigger block of memory to hold more pointers and keep on going.

The second type block of memory you will handle is the storage for each line. No mystery there. You will allocate storage for each character ( +1 for the null-terminating character) and you will copy the line from your fixed buffer to the allocated block.

The two blocks of memory work together, because to create your collection, you simply assign the address for the block of memory holding the line of data to the next available pointer.

Let's think through a short example where we declare char **lines; as the pointer to the block of memory holding pointers. Then say we allocate two-pointers initially, we have valid pointers available for lines[0] and lines[1] . We track the number of pointers available with nptrs and the number used with used . So initially nptrs = 2; and used = 0; .

When we read our first line with fgets() , we will trim the '\n' from the end of the string and then get the length of the string ( len = strlen(buffer); ). We can then allocate storage for the string assigning the address of the allocated block to our first pointer, eg

lines[used] = malloc (len + 1);

and then copy the contents of buffer to lines[0] , eg

memcpy (lines[used], buffer, len + 1);

( note: there is no reason to call strcpy() and have it scan for end-of-string again, we already know how many characters to copy -- including the nul-terminating character)

Finally, all that is needed to keep our counters happy is to increment used by one. We store the next line the same way, and on the 3rd iteration used == nptrs so we realloc() more pointers (generally just doubling the number of pointers each time a realloc() is required). That is a good balance between calls to realloc() and growth of the number of pointers -- but you are free to increment the allocation any way you like -- but avoid calling realloc() for every line.

So you keep reading lines, checking if realloc() is required, reallocating if needed, and allocating for each line assigning the starting address to each of your pointers in turn. The only additional note is that when you realloc() you always use a temporary pointer so when realloc() fails and returns NULL , you do not overwrite your original pointer with NULL losing the starting address to the block of memory holding pointers -- creating a memory leak.

Implementation

The details were left out of the overview, so let's look at a short example to read an unknown number of lines from a file (each line being 1024 characters or less) and storing each line in a collection using a pointer-to-pointer to char as described above. Don't use Magic-Numbers in your code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXC 1024       /* if you need a constant, #define one (or more) */
#define NPTRS   2       /* initial no. of pointers to allocate (lines) */

Don't hardcode Filenames in your code either, that was argc and argv are for in int main (int argc, char **argv) . Pass the filename to read as the first argument to the program (or read from stdin by default if no argument is given):

int main (int argc, char **argv) {
    
    char buf[MAXC],                 /* fixed buffer to read each line */
        **lines = NULL;             /* pointer to pointer to hold collection of lines */
    size_t  nptrs = NPTRS,          /* number of pointers available */
            used = 0;               /* number of pointers used */
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
    
    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

( note: you should not need to recompile your program just to read from a different filename)

Now allocate and Validate your initial number of pointers

    /* allocate/validate block holding initial nptrs pointers */
    if ((lines = malloc (nptrs * sizeof *lines)) == NULL) {
        perror ("malloc-lines");
        exit (EXIT_FAILURE);
    }

Read each line and trim the '\n' from the end and get the number of characters that remaining after the '\n' has been removed (you can use strcspn() to do it all at once):

    while (fgets (buf, MAXC, fp)) {                 /* read each line into buf */
        size_t len;
        buf[(len = strcspn (buf, "\n"))] = 0;       /* trim \n, save length */

Next we check if a reallocation is needed and if so reallocate using a temporary pointer:

        if (used == nptrs) {    /* check if realloc of lines needed */
            /* always realloc using temporary pointer (doubling no. of pointers) */
            void *tmp = realloc (lines, (2 * nptrs) * sizeof *lines);
            if (!tmp) {                             /* validate reallocation */
                perror ("realloc-lines");
                break;                              /* don't exit, lines still good */
            }
            lines = tmp;                            /* assign reallocated block to lines */
            nptrs *= 2;                             /* update no. of pointers allocatd */
            /* (optionally) zero all newly allocated memory here */
        }

Now allocate and Validate the storage for the line and copy the line to the new storage, incrementing used when done -- completing your read-loop.

        /* allocate/validate storage for line */
        if (!(lines[used] = malloc (len + 1))) {
            perror ("malloc-lines[used]");
            break;
        }
        memcpy (lines[used], buf, len + 1);         /* copy line from buf to lines[used] */
        
        used += 1;                                  /* increment used pointer count */
    }
    /* (optionally) realloc to 'used' pointers to size no. of pointers exactly here */
    
    if (fp != stdin)   /* close file if not stdin */
        fclose (fp);

Now you can use the lines stored in lines as needed in your program, remembering to free the memory for each line when done and then finally freeing the block of pointers, eg

    /* use lines as needed (simply outputting here) */
    for (size_t i = 0; i < used; i++) {
        printf ("line[%3zu] : %s\n", i, lines[i]);
        free (lines[i]);                            /* free line storage when done */
    }
    
    free (lines);       /* free pointers when done */
}

That's all that is needed. Now you can go read the 324,000 words in /usr/share/dict/words (or perhaps on your system /var/lib/dict/words depending on distro) and you will not have any problems doing so.

Input File

A short example file:

$ cat dat/captnjack.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.

Example Use/Output

$ ./bin/fgets_lines_dyn_simple dat/captnjack.txt
line[  0] : This is a tale
line[  1] : Of Captain Jack Sparrow
line[  2] : A Pirate So Brave
line[  3] : On the Seven Seas.

Memory Use/Error Check

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.

For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

$ valgrind ./bin/fgets_lines_dyn_simple dat/captnjack.txt
==8156== Memcheck, a memory error detector
==8156== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==8156== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==8156== Command: ./bin/fgets_lines_dyn_simple dat/captnjack.txt
==8156==
line[  0] : This is a tale
line[  1] : Of Captain Jack Sparrow
line[  2] : A Pirate So Brave
line[  3] : On the Seven Seas.
==8156==
==8156== HEAP SUMMARY:
==8156==     in use at exit: 0 bytes in 0 blocks
==8156==   total heap usage: 9 allocs, 9 frees, 5,796 bytes allocated
==8156==
==8156== All heap blocks were freed -- no leaks are possible
==8156==
==8156== For counts of detected and suppressed errors, rerun with: -v
==8156== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors.

The Full Code

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXC 1024       /* if you need a constant, #define one (or more) */
#define NPTRS   2       /* initial no. of pointers to allocate (lines) */

int main (int argc, char **argv) {
    
    char buf[MAXC],                 /* fixed buffer to read each line */
        **lines = NULL;             /* pointer to pointer to hold collection of lines */
    size_t  nptrs = NPTRS,          /* number of pointers available */
            used = 0;               /* number of pointers used */
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
    
    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }
    
    /* allocate/validate block holding initial nptrs pointers */
    if ((lines = malloc (nptrs * sizeof *lines)) == NULL) {
        perror ("malloc-lines");
        exit (EXIT_FAILURE);
    }
    
    while (fgets (buf, MAXC, fp)) {                 /* read each line into buf */
        size_t len;
        buf[(len = strcspn (buf, "\n"))] = 0;       /* trim \n, save length */
        
        if (used == nptrs) {    /* check if realloc of lines needed */
            /* always realloc using temporary pointer (doubling no. of pointers) */
            void *tmp = realloc (lines, (2 * nptrs) * sizeof *lines);
            if (!tmp) {                             /* validate reallocation */
                perror ("realloc-lines");
                break;                              /* don't exit, lines still good */
            }
            lines = tmp;                            /* assign reallocated block to lines */
            nptrs *= 2;                             /* update no. of pointers allocatd */
            /* (optionally) zero all newly allocated memory here */
        }
        
        /* allocate/validate storage for line */
        if (!(lines[used] = malloc (len + 1))) {
            perror ("malloc-lines[used]");
            break;
        }
        memcpy (lines[used], buf, len + 1);         /* copy line from buf to lines[used] */
        
        used += 1;                                  /* increment used pointer count */
    }
    /* (optionally) realloc to 'used' pointers to size no. of pointers exactly here */
    
    if (fp != stdin)   /* close file if not stdin */
        fclose (fp);

    /* use lines as needed (simply outputting here) */
    for (size_t i = 0; i < used; i++) {
        printf ("line[%3zu] : %s\n", i, lines[i]);
        free (lines[i]);                            /* free line storage when done */
    }
    
    free (lines);       /* free pointers when done */
}

Look things over and let me know if you have any questions. If you also wanted to read lines of unknown length (millions of characters long), you would simply loop doing the same thing allocating and reallocating for each line until the '\n' character was found (or EOF ) marking the end of the line. It is no different in principle than what we have done above for the pointers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM