简体   繁体   中英

Optimizing .txt files creation speed

I've written the following simple testing code, that creates 10 000 empty .txt files in a subdirectory.

#include <iostream>
#include <time.h>
#include <string>
#include <fstream>

void CreateFiles()
{
    int i = 1;
    while (i <= 10000) {
        int filename = i;
        std::string string_i = std::to_string(i);
        std::string file_dir = ".\\results\\"+string_i+".txt";
        std::ofstream outfile(file_dir);
        i++;
    }
}

int main()
{
    clock_t tStart1 = clock();
    CreateFiles();
    printf("\nHow long it took to make files: %.2fs\n", (double)(clock() - tStart1)/CLOCKS_PER_SEC);
    std::cin.get();
    return 0;
}

Everything works fine. All 10 000 .txt files are created within ~3.55 seconds. (using my PC)

Question 1: Ignoring the conversion from int to std::string etc., is there anything that I could optimize here for the program to create the files faster? I specifically mean the std::ofstream outfile usage - perhaps using something else would be relevantly faster?

Regardless, ~3,55 seconds is satisfying compared to the following:

I have modified the function so right now it would also fill the .txt files with some random i integer data and some constant text:

void CreateFiles()
{
    int i = 1;
    while (i <= 10000) {
        int filename = i;
        std::string string_i = std::to_string(i);
        std::string file_dir = ".\\results\\"+string_i+".txt";
        std::ofstream outfile(file_dir);

        // Here is the part where I am filling the .txt with some data
        outfile << i << " some " << i << " constant " << i << " text " << i << " . . . " 
        << i << " --more text-- " << i << " --even more-- " << i;
        i++;
    }
}

And now everything (creating the .txt files and filling it with short data) executes within... ~37 seconds. That's a huge difference. And that's only 10 000 files.

Question 2: Is there anything I can optimize here? Perhaps there exist some alternative that would fill the .txt files quicker. Or perhaps I have forgotten about something very obvious that slows down the entire process?

Or, perhaps I am exaggerating a little bit and ~37 seconds seems normal and optimized?

Thanks for sharing your insights!

The speed of creation of file is hardware dependent, faster the drive faster you can create the files.

This is evident from the fact that I ran your code on an ARM processor (Snapdragon 636, on a Mobile phone using termux), now mobile phones have flash memory that are very fast when it comes to I/O. So it ran under 3 seconds most of the time and some time 5 second . This variation is expected as drive has to handle multi process read writes. You reported that it took 47 seconds for your hardware. Hence you can safely conclude that I/O speed is significantly dependent on Hardware.


None the less I thought to do some optimization to your code and I used 2 different approaches.

  • Using a C counterpart for I/O

  • Using C++ but writing in a chunk in one go.

I ran the simulation on my phone. I ran it 50 times and here are the results.

  • C was fastest taking 2.73928 second on average to write your word on 10000 text files, using fprintf

  • C++ writing with the complete line at one go took 2.7899 seconds. I used sprintf to get the complete line into a char[] then wrote using << operator on ofstream.

  • C++ Normal (Your Code) took 2.8752 seconds

This behaviour is expected, writing in chunks is fasters. Read this answer as to why. C was fastest no doubt.

You may note here that The difference is not that significant but if you are on a hardware with slow I/O, this becomes significant.


Here is the code I used for simulation. You can test it yourself but make sure to replace std::system argument with your own commands (different for windows).

#include <iostream>
#include <time.h>
#include <string>
#include <fstream>
#include <stdio.h>

void CreateFiles()
{
    int i = 1;
    while (i <= 10000) {
       // int filename = i;
        std::string string_i = std::to_string(i);
        std::string file_dir = "./results/"+string_i+".txt";
        std::ofstream outfile(file_dir);

        // Here is the part where I am filling the .txt with some data
        outfile << i << " some " << i << " constant " << i << " text " << i << " . . . " 
        << i << " --more text-- " << i << " --even more-- " << i;
        i++;
    }
}

void CreateFilesOneGo(){
    int i = 1;
    while(i<=10000){
        std::string string_i = std::to_string(i);
        std::string file_dir = "./results3/" + string_i + ".txt";
        char buffer[256];
        sprintf(buffer,"%d some %d constant %d text %d . . . %d --more text-- %d --even more-- %d",i,i,i,i,i,i,i);
        std::ofstream outfile(file_dir);
        outfile << buffer;
        i++;
    }
}

void CreateFilesFast(){
    int i = 1;
    while(i<=10000){
    // int filename = i;
    std::string string_i = std::to_string(i);
    std::string file_dir = "./results2/"+string_i+".txt";
    FILE *f = fopen(file_dir.c_str(), "w");
    fprintf(f,"%d some %d constant %d text %d . . . %d --more text-- %d --even more-- %d",i,i,i,i,i,i,i);
    fclose(f);
    i++;
    }
}

int main()
{
    double normal = 0, one_go = 0, c = 0;
    for (int u=0;u<50;u++){
        std::system("mkdir results results2 results3");

        clock_t tStart1 = clock();
        CreateFiles();
        //printf("\nNormal : How long it took to make files: %.2fs\n", (double)(clock() - tStart1)/CLOCKS_PER_SEC);
        normal+=(double)(clock() - tStart1)/CLOCKS_PER_SEC;

        tStart1 = clock();
        CreateFilesFast();
        //printf("\nIn C : How long it took to make files: %.2fs\n", (double)(clock() - tStart1)/CLOCKS_PER_SEC);
        c+=(double)(clock() - tStart1)/CLOCKS_PER_SEC;

        tStart1 = clock();
        CreateFilesOneGo();
        //printf("\nOne Go : How long it took to make files: %.2fs\n", (double)(clock() - tStart1)/CLOCKS_PER_SEC);
        one_go+=(double)(clock() - tStart1)/CLOCKS_PER_SEC;

        std::system("rm -rf results results2 results3");
        std::cout<<"Completed "<<u+1<<"\n";
    }

    std::cout<<"C on average took : "<<c/50<<"\n";
    std::cout<<"Normal on average took : "<<normal/50<<"\n";
    std::cout<<"One Go C++ took : "<<one_go/50<<"\n";

    return 0;
}

Also I used clang-7.0 as the compiler.

If you have any other approach let me know, I will test that too. If you find a mistake do let me know, I will correct it as soon as possible.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM