简体   繁体   中英

How to write text file faster than fprintf in c?

I have to save some graph data(array of structs) into text file. I made working program using fprintf but for extra points I need to be faster. I have spend couple hours googling if there is anything faster and try to use fwrite (but I wasn't able to fwrite as a text) I cannot really find any other functions etc.

This is my write function using fprintf:

void save_txt(const graph_t * const graph, const char *fname)
{
    int count = graph->num_edges, i = 0;
    FILE *f = fopen(fname, "w");
    while (count > 0) {
        int r = fprintf(f, "%d %d %d\n", (graph->edges[i].from), (graph->edges[i].to), (graph->edges[i].cost));
        i++;
        if (r >= 6) {
            count -= 1;
        } else {
            break;
        }
    }
    if (f) {
        fclose(f);
    }
}

I would try setting a write buffer on the stream, and experimenting with different sizes of buffer (eg 1K, 2K, 4K, 8K and so on). Notice that by default your file is already using a buffer of BUFSIZ value, and it might be already enough.

#define BUFFERSIZE 0x1000

void save_txt(const graph_t * const graph, const char *fname)
{
    int count = graph->num_edges, i = 0;
    unsigned char buf[BUFFERSIZE];

    FILE *f = fopen(fname, "w");
    setvbuf(f, buf, _IOFBF, BUFFERSIZE);

    ...

The output file f is born with the default BUFSIZ cache, so it might benefit from a larger fully buffered write cache.

Of course this assumes that you're writing to a relatively slow medium and that the time spent saving is relevant; otherwise, whatever is slowing you down is not here , and therefore increasing save performances won't help you appreciably.

There are instrumentations like prof and gprof that can help you determine where your program is spending the most time.

One, much more awkward, possibility is merging Kiwi's answer with a buffered write call to avoid the code in printf that verifies which format to use, since you already know this, and to use as few I/O calls as possible (even just one if BUFFERSIZE is larger than your destination file's length).

// These variables must now be global, declared outside save_txt.
char kiwiBuf[BUFFERSIZE];
size_t kiwiPtr = 0;
FILE *f;

void my_putchar(char c) {
    kiwiBuf[kiwiPtr++] = c;
    // Is the buffer full?
    if (kiwiPtr == BUFFERSIZE) {
        // Yes, empty the buffer into the file.
        flushBuffer();
    }
}

void flushBuffer() {
    if (kiwiPtr) {
        fwrite(kiwiBuf, kiwiPtr, 1, f);
        kiwiPtr = 0;
    }
}

You need now to flush the buffer before close:

void save_txt(const graph_t * const graph, const char *fname)
{
    int i, count = graph->num_edges;
    f = fopen(fname, "w");
    if (NULL == f) {
        fprintf(stderr, "Error opening %s\n", fname);
        exit(-1);
    }
    for (i = 0; i < count; i++) {
        my_put_nbr(graph->edges[i].from);
        my_putchar(' ');
        my_put_nbr(graph->edges[i].to);
        my_putchar(' ');
        my_put_nbr(graph->edges[i].cost);
        my_putchar('\n');
    }
    flushBuffer();
    fclose(f);
}

UPDATE

By declaring the my_putchar function as inline and with a 4K buffer, the above code (modified with a mock of graph reading from an array of random integers) is around 6x faster than fprintf on

Linux mintaka 4.12.8-1-default #1 SMP PREEMPT Thu Aug 17 05:30:12 UTC 2017 (4d7933a) x86_64 x86_64 x86_64 GNU/Linux
gcc version 7.1.1 20170629 [gcc-7-branch revision 249772] (SUSE Linux)

About 2x of that seems to come from buffering. Andrew Henle made me notice an error in my code: I was comparing results to a baseline of unbuffered output, but fopen uses by default a BUFSIZ value and on my system BUFSIZ is 8192. So basically I've "discovered" just that:

  • there is no advantage on a 8K buffer, 4K is enough
  • my original suggestion of using _IOFBF is utterly worthless as the system already does it for you. This in turn means that Kiwi's answer is the most correct , for - as Andrew pointed out - avoids printf 's checks and conversions.

Also, the overall increase (google Amdahl's Law) depends on what fraction of processing time goes into saving. Clearly if one hour of elaboration requires one second of saving, doubling saving speed saves you half a second; while increasing elaboration speed by 1% saves you 36 seconds, or 72 times more.

My own sample code was designed to be completely save-oriented with very large graphs; in this situation, any small improvement in writing speed reaps potentially huge rewards, which might be unrealistic in the real-world case.

Also (in answer to a comment), while using a small enough buffer will slow saving, it is not at all certain that using a larger buffer will benefit. Say that the whole graph generates in its entirety 1.2Kb of output; then of course any buffer value above 1.2Kb will yield no improvements. Actually, allocating more memory might negatively impact performances.

I would write a small function say print_graph(int int int) and call write directly in it

or something like this with my_putchar being a write call

int     my_put_nbr(int nb)
{
if (nb < 0)
{
    my_putchar('-');
    nb = -nb;
}
if (nb <= 9)
    my_putchar(nb + 48);
else
{
    my_put_nbr(nb / 10);
    my_put_nbr(nb % 10);
}
return (0);
}

I had to be 1.3x faster than fprintf, here is code that worked for me. I have to say that I had to submit it multiple times, sometimes I passed only 1 out of 5 tests with the same code. In conclusion, it is faster than fprintf but not reliably 1.3times faster..

void save_txt(const graph_t * const graph, const char *fname)
    {
        int count = graph->num_edges, i = 0;
        char c = '\n';
        char d = ' ';
        char buffer[15];
        FILE *f = fopen(fname, "w");
        while (count > 0) {
            itoa(graph->edges[i].from,buffer,10);
            fputs(buffer, f);
            putc(d, f);
            itoa(graph->edges[i].to,buffer,10);
            fputs(buffer, f);
            putc(d, f);
            itoa(graph->edges[i].cost,buffer,10);
            fputs(buffer, f);
            putc(c, f);
            i++;
            count -= 1;
        }
        if (f) {
            fclose(f);
        } 
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM