简体   繁体   中英

Saving vector to file in parallel

I have a sorted vector of half a milion numbers (in C++). Storing it to a textfile takes about 10 seconds and uses only 50% CPU (1 core). I was thinking of parallelising it, saving 2 separate files (first and second half of vector) and then concatenating these files.

Problem is, I'm not able to find any different way to concatenate other than reading byte-by-byte and joining to the first file... Is there any platform-independent way (Boost or Windows-specific) to join files effectively?

What little you're telling seems nonetheless to strongly indicate a very inefficient way to write your textfile. Possibly you're using endl , which causes a flush . Replace that with \n . Next, if that doesn't speed things up, consider a more effient number-to-text conversion than simply using << . sprintf springs to mind. Finally, if you're still in the 10-second range instead of the 1/10 second range, consider more serious optimization (eg, on a Windows machine you might allocate the file with the right size at the start, so on).

Cheers & hth.,

Concatenating two files would probably take more time, as typical filesystems do not support simple splice operations to piece together multiple files into one file efficiently.

While there are some ways you can write to files using multiple cores, chances are very good the bottleneck is actually your Disk IO speed. You can run vmstat 1 on a Linux system and many Unix systems to see just your disk writing speed. (As well as many other neat measures.) Windows has a similar tool, but I can never recall the name of the thing. If your writing speed is near the speed of your disk, you probably can't get more performance by adding more cores.

If you want to try anyway, there are three approaches that can work:

  • use multiple threads / processes to copy from your vector into a memory mapped location backed by your file. open(2) the file, run mmap(2) to map it into memory, and then start copying data.
  • use multiple threads / processes to copy data to disk using the pwrite(2) system call to specify the offset in the file to write that specific block of data
  • use a single thread and the aio_write(3) system call to submit asynchronous writes to disk. (I'm not convinced that this will actually use multiple cores, but the libraries / kernel certainly could implement it that way.)

The first two approaches require that the data you're writing be a predictable size; if you're really writing 500k numbers, they'll each take 4 or 8 or some other fixed size , making it pretty easy -- just assign the first 256k numbers to the first thread, and the next pile of numbers to the next thread, starting at 256*1024*8 bytes into the file.


Don't forget that spinning hard drives have latency when seeking all over your drive. Linear read and write patterns work best for spinning metal disks. The random access mechanisms I suggested in the first two bullet points would work best if each were writing to different disks (difficult with a single file:) or you had a solid state drive with no seek latency.

I would usually agree that your drive is the bottleneck - BUT if the CPU usage is exactly 50% in a dual-core system, that would imply that CPU is indeed the problem. In that case it is the number to string conversion that is bogging down. See Alf's answer for tips to optimise this.

To parallelise that give each thread a chunk of the vector and an ostream. The first thread gets the file as its ostream, but the others get memory streams. Once the first thread has completed, and as each other thread completes (in order), write each memory stream to the file.

The formatting is now done in parallel, with the actual write-to-files being serialised.

Formatting is incredibly expensive. Writing 128M double precision numbers to disk with fprintf() vs fwrite() can easily take 10x as long, because of the formatting and because of the large number of calls (compared to one big fwrite()); try the code below and see if you get similar timings. Text files aren't the way to deal with significant amounts of data; if you're not actually going to sit down and read it all yourself, it aughtn't be in ascii.

If you do want to stay with text, and you impose a rigid format (eg, all the numbers take exactly the same amount of bytes in the file), then you can break up the list into big blocks, and have each core format one set of numbers to a big string, and fseek() to the appropriate position in the file and dump it out. You can play with the blocksize to see what the best tradeoff for memory/performance is. If you really are bottlenecked by CPU, this should allow you to overlap I/O with computation and get some win.

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <time.h>
/* Jonathan Dursi, SciNet */

#define FILESIZE 1024*1024*128

int write_file_bin(const char *fname, const double *data, const int ndata) {

    FILE *fp;
    time_t start, end;

    start = time(NULL);
    fwrite(data, sizeof(double), ndata, fp);
    end = time(NULL);

    return (int)(end-start);

int write_file_ascii(const char *fname, const double *data, const int ndata) {

    FILE *fp;
    time_t start, end;
    int i;

    start = time(NULL);
    for (i=0;i<ndata;i++) {
    end = time(NULL);

    return (int)(end-start);

int main(int argc, char **argv) {
    double *data;
    int i;
    int asciitime, bintime;

    data = (double *)malloc(FILESIZE * sizeof(double));
    for (i=0;i<FILESIZE;i++) {
        data[i] = i*(double)i/2.;

    asciitime = write_file_ascii("data.txt",data,FILESIZE); 
    bintime   = write_file_bin("data.dat",data,FILESIZE); 

    printf("Time to write files: ASCII: %d, Binary: %d\n",asciitime, bintime);

    return 0;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM