简体   繁体   中英

fgets() vs getc() with big data?

I want to know specifically what the difference in speed is between fgets() and getc() - primarily for big amounts of data. I picked getc() because in the other threads someone said that it is faster than fgetc() because it can be run as a macro. Analogously, gets() is discouraged and deprecated because it has no buffer limits like fgets().

I made a little example program in hopes that someone knows how to measure time and resources used by the two alternatives. The example needs a file with an amount of characters corresponding to CHARS_PER_LINE * LINES. If you run the program without any arguments, it will try to copy the file into memory with getc(), otherwise if any amount of arguments is passed, it will run the fgets() version.

#include <stdio.h>
#include <stdlib.h>

#define CHARS_PER_LINE  2000
#define LINES           100

int main (int argc, char *argv[]) {
    // DECLARE VARS
    char **data;
    FILE *fp;

    // ALLOCATE 2D ARRAY MEMORY
    data = malloc(LINES * sizeof(char*));

    for (int i = 0; i < CHARS_PER_LINE; i++) {
        data[i] = malloc(CHARS_PER_LINE * sizeof(char));
    }

    // OPEN FILE FOR READING
    fp = fopen ("file.txt", "r");

    // COPY CHARS WITH GETC()
    if (argc == 1) {                    // if no arguments - getc
        for (int i = 0; i < LINES; i++) {
            for (int ii = 0; ii < CHARS_PER_LINE; ii++) {
                data[i][ii] = getc(fp);
            }
        }
    }  
    // COPY CHARS WITH FGETS()
    else {                              // if any amount of arguments passed - fgets
        for (int i = 0; i < LINES; i++) {
            fgets(data[i], (CHARS_PER_LINE + 1), fp);
                                        // does fgets not have a buffer limit?
        }
    }

    // CLOSE FILE
    fclose(fp);

    return(0);
}

I am aware that this is bad and insecure and a lot of assumptions are made, but I tried to focus on making the code similar so it can be tested fairly. Logically, the hypothesis is that getc will outperform fgets for low number of chars fetched per iteration, and fgets will outperform getc for large amounts of data. The question still remains - by how much and at what rate?

In hopes that we can bring this eternal question to a conclusion with some hard numbers, please help

fgets will read the file until a null character is read (or the buffer is full). getc will read only a single byte. They do different things so not really comparable.

Ok, I decided to help myself, so I went and figured out a way to benchmark, which I bet will now get more criticism than I got help for the initial problem.

Here is the full code. You can get files with random chars by searching "random file generator".

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define CHARS_PER_LINE  2000
#define LINES           100

int main (int argc, char *argv[]) {
    // DECLARE VARS
    char **data;
    FILE *fp1;
    FILE *fp2;
    clock_t time_before, time_after;
    double time1, time2;
    char check1, check2;

    // ALLOCATE 2D ARRAY MEMORY
    data = malloc(LINES * sizeof(char*));

    for (int i = 0; i < CHARS_PER_LINE; i++) {
        data[i] = malloc(CHARS_PER_LINE * sizeof(char));
    }


    if (argc == 3) {
    // OPEN FILE 1 FOR READING
        fp1 = fopen (argv[1], "r");

        // COPY CHARS WITH GETC()
        time_before = clock();
        for (int i = 0; i < LINES; i++) {
            for (int ii = 0; ii < CHARS_PER_LINE; ii++) {
                data[i][ii] = getc(fp1);
            }
        }
        time_after = clock();

        time1 = (double)(time_after - time_before);
        check1 = data[50][1495];

        // CLOSE FILE 1
        fclose(fp1);

    // OPEN FILE 2 FOR READING
        fp2 = fopen (argv[2], "r");

        // COPY CHARS WITH FGETS()
        time_before = clock();
        for (int i = 0; i < LINES; i++) {
            fgets(data[i], CHARS_PER_LINE, fp2);
        }
        time_after = clock();

        time2 = (double)(time_after - time_before);
        check2 = data[50][1495];

        // CLOSE FILE 2
        fclose(fp2);

    // PRINT RESULTS
        // appended characters to check for consistency amd accuracy of reads
        printf("%s: %f\t%s: %f\t%c%c\n", argv[1], time1, argv[2], time2, check1, check2);
    }  
    else {
        puts("Wrong number of args. Put names of 2 files as arguments.");
    }

    return(0);
}

The output was:

a.txt: 1169.000000  b.txt: 95.000000    vp
b.txt: 826.000000   a.txt: 67.000000    be
a.txt: 1146.000000  b.txt: 91.000000    vp
b.txt: 1139.000000  a.txt: 89.000000    be
a.txt: 821.000000   b.txt: 77.000000    vp
b.txt: 1069.000000  a.txt: 91.000000    be
a.txt: 1141.000000  b.txt: 91.000000    vp
b.txt: 822.000000   a.txt: 70.000000    be
a.txt: 776.000000   b.txt: 68.000000    vp
b.txt: 996.000000   a.txt: 90.000000    be
a.txt: 1143.000000  b.txt: 92.000000    vp
b.txt: 1141.000000  a.txt: 93.000000    be
a.txt: 1138.000000  b.txt: 92.000000    vp
b.txt: 1140.000000  a.txt: 92.000000    be

As you can see, overwhelmingly in favor of fgets(). Like, by an order of magnitude. Around 11 or 12 times.

If you have any useful specific comments about this, or if you would condescend to confirm or deny this research, please do.

Do I get gold now or something?

From your question and answer:

in my example I am reading full lines of known length

If the file has such a simple structure (same line lengths)

the fastest will be -

    // ALLOCATE 2D ARRAY MEMORY
    char (*data)[CHAR_PERLINE] = malloc(LINES * sizeof(*data);

    if(fread(data, sizeof(*data), LINES, fpx) != LINES * sizeof(*data))
    {
    /* Something gone wrong, do something. */
    }

But the lines will not be null character terminated

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM