简体   繁体   English

优化.txt文件的创建速度

[英]Optimizing .txt files creation speed

I've written the following simple testing code, that creates 10 000 empty .txt files in a subdirectory. 我编写了以下简单的测试代码,该代码在子目录中创建10000个空的.txt文件。

#include <iostream>
#include <time.h>
#include <string>
#include <fstream>

void CreateFiles()
{
    int i = 1;
    while (i <= 10000) {
        int filename = i;
        std::string string_i = std::to_string(i);
        std::string file_dir = ".\\results\\"+string_i+".txt";
        std::ofstream outfile(file_dir);
        i++;
    }
}

int main()
{
    clock_t tStart1 = clock();
    CreateFiles();
    printf("\nHow long it took to make files: %.2fs\n", (double)(clock() - tStart1)/CLOCKS_PER_SEC);
    std::cin.get();
    return 0;
}

Everything works fine. 一切正常。 All 10 000 .txt files are created within ~3.55 seconds. 所有10000个.txt文件都在~3.55秒内创建。 (using my PC) (使用我的电脑)

Question 1: Ignoring the conversion from int to std::string etc., is there anything that I could optimize here for the program to create the files faster? 问题1:忽略从intstd::string等的转换,对于程序可以更快地创建文件,我有什么可以优化的地方吗? I specifically mean the std::ofstream outfile usage - perhaps using something else would be relevantly faster? 我特别是指std::ofstream outfile用法-也许使用其他方法会更快一些?

Regardless, ~3,55 seconds is satisfying compared to the following: 无论如何,与以下相比, ~3,55秒令人满意:

I have modified the function so right now it would also fill the .txt files with some random i integer data and some constant text: 我已经修改了该函数,所以现在它还会用一些随机的i整数数据和一些常量文本填充.txt文件:

void CreateFiles()
{
    int i = 1;
    while (i <= 10000) {
        int filename = i;
        std::string string_i = std::to_string(i);
        std::string file_dir = ".\\results\\"+string_i+".txt";
        std::ofstream outfile(file_dir);

        // Here is the part where I am filling the .txt with some data
        outfile << i << " some " << i << " constant " << i << " text " << i << " . . . " 
        << i << " --more text-- " << i << " --even more-- " << i;
        i++;
    }
}

And now everything (creating the .txt files and filling it with short data) executes within... ~37 seconds. 现在,所有操作(创建.txt文件并用短数据填充)在~37秒内即可执行。 That's a huge difference. 那是巨大的差异。 And that's only 10 000 files. 而且只有10,000个文件。

Question 2: Is there anything I can optimize here? 问题2:在这里我有什么可以优化的吗? Perhaps there exist some alternative that would fill the .txt files quicker. 也许有一些替代方法可以更快地填充.txt文件。 Or perhaps I have forgotten about something very obvious that slows down the entire process? 还是我忘记了一些很明显的事情,这些事情会减慢整个过程?

Or, perhaps I am exaggerating a little bit and ~37 seconds seems normal and optimized? 或者,也许我有点夸张, ~37秒似乎正常且已优化?

Thanks for sharing your insights! 感谢您分享您的见解!

The speed of creation of file is hardware dependent, faster the drive faster you can create the files. 文件的创建速度取决于硬件,更快的驱动器可以更快地创建文件。

This is evident from the fact that I ran your code on an ARM processor (Snapdragon 636, on a Mobile phone using termux), now mobile phones have flash memory that are very fast when it comes to I/O. 我在ARM处理器 (Snapdragon 636,在使用termux的手机上)上运行您的代码这一事实就可以明显看出这一点,现在手机的闪存在I / O方面非常快。 So it ran under 3 seconds most of the time and some time 5 second . 因此,大多数情况下,运行时间不到3秒,而有些时候则是5秒 This variation is expected as drive has to handle multi process read writes. 由于驱动器必须处理多进程读写,因此预计会出现这种变化。 You reported that it took 47 seconds for your hardware. 您报告说,硬件花费了47秒。 Hence you can safely conclude that I/O speed is significantly dependent on Hardware. 因此,您可以放心得出结论,I / O速度很大程度上取决于硬件。


None the less I thought to do some optimization to your code and I used 2 different approaches. 尽管如此,我还是想对您的代码做一些优化,并且我使用了两种不同的方法。

  • Using a C counterpart for I/O 使用C副本进行I / O

  • Using C++ but writing in a chunk in one go. 使用C ++,但一次性编写大块代码。

I ran the simulation on my phone. 我在手机上运行了模拟。 I ran it 50 times and here are the results. 我运行了50次,结果如下。

  • C was fastest taking 2.73928 second on average to write your word on 10000 text files, using fprintf 使用fprintf,C最快平均花费2.73928秒在10000个文本文件上写单词

  • C++ writing with the complete line at one go took 2.7899 seconds. 一次完成整行的C ++编写花费了2.7899秒。 I used sprintf to get the complete line into a char[] then wrote using << operator on ofstream. 我使用sprintf将完整的行放入char []中,然后在流上使用<<操作符进行编写。

  • C++ Normal (Your Code) took 2.8752 seconds C ++普通(您的代码)花了2.8752秒

This behaviour is expected, writing in chunks is fasters. 此行为是预期的,分块写入速度更快。 Read this answer as to why. 阅读有关原因的答案。 C was fastest no doubt. 毫无疑问,C是最快的。

You may note here that The difference is not that significant but if you are on a hardware with slow I/O, this becomes significant. 您可能会在这里注意到,差异并不明显,但是如果您使用的是I / O速度较慢的硬件,那么这会变得很重要。


Here is the code I used for simulation. 这是我用于仿真的代码。 You can test it yourself but make sure to replace std::system argument with your own commands (different for windows). 您可以自己对其进行测试,但请确保使用您自己的命令替换std::system参数(与Windows不同)。

#include <iostream>
#include <time.h>
#include <string>
#include <fstream>
#include <stdio.h>

void CreateFiles()
{
    int i = 1;
    while (i <= 10000) {
       // int filename = i;
        std::string string_i = std::to_string(i);
        std::string file_dir = "./results/"+string_i+".txt";
        std::ofstream outfile(file_dir);

        // Here is the part where I am filling the .txt with some data
        outfile << i << " some " << i << " constant " << i << " text " << i << " . . . " 
        << i << " --more text-- " << i << " --even more-- " << i;
        i++;
    }
}

void CreateFilesOneGo(){
    int i = 1;
    while(i<=10000){
        std::string string_i = std::to_string(i);
        std::string file_dir = "./results3/" + string_i + ".txt";
        char buffer[256];
        sprintf(buffer,"%d some %d constant %d text %d . . . %d --more text-- %d --even more-- %d",i,i,i,i,i,i,i);
        std::ofstream outfile(file_dir);
        outfile << buffer;
        i++;
    }
}

void CreateFilesFast(){
    int i = 1;
    while(i<=10000){
    // int filename = i;
    std::string string_i = std::to_string(i);
    std::string file_dir = "./results2/"+string_i+".txt";
    FILE *f = fopen(file_dir.c_str(), "w");
    fprintf(f,"%d some %d constant %d text %d . . . %d --more text-- %d --even more-- %d",i,i,i,i,i,i,i);
    fclose(f);
    i++;
    }
}

int main()
{
    double normal = 0, one_go = 0, c = 0;
    for (int u=0;u<50;u++){
        std::system("mkdir results results2 results3");

        clock_t tStart1 = clock();
        CreateFiles();
        //printf("\nNormal : How long it took to make files: %.2fs\n", (double)(clock() - tStart1)/CLOCKS_PER_SEC);
        normal+=(double)(clock() - tStart1)/CLOCKS_PER_SEC;

        tStart1 = clock();
        CreateFilesFast();
        //printf("\nIn C : How long it took to make files: %.2fs\n", (double)(clock() - tStart1)/CLOCKS_PER_SEC);
        c+=(double)(clock() - tStart1)/CLOCKS_PER_SEC;

        tStart1 = clock();
        CreateFilesOneGo();
        //printf("\nOne Go : How long it took to make files: %.2fs\n", (double)(clock() - tStart1)/CLOCKS_PER_SEC);
        one_go+=(double)(clock() - tStart1)/CLOCKS_PER_SEC;

        std::system("rm -rf results results2 results3");
        std::cout<<"Completed "<<u+1<<"\n";
    }

    std::cout<<"C on average took : "<<c/50<<"\n";
    std::cout<<"Normal on average took : "<<normal/50<<"\n";
    std::cout<<"One Go C++ took : "<<one_go/50<<"\n";

    return 0;
}

Also I used clang-7.0 as the compiler. 我也使用clang-7.0作为编译器。

If you have any other approach let me know, I will test that too. 如果您还有其他方法要告诉我,我也会进行测试。 If you find a mistake do let me know, I will correct it as soon as possible. 如果您发现错误,请告诉我,我们将尽快予以纠正。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM