简体   繁体   English

为什么使用istreambuf迭代器读取文件会因重复执行而变得更快?

[英]Why reading a file using istreambuf iterators gets faster with repeated execution?

I was looking for a way to read a whole file into a string. 我正在寻找一种将整个文件读入字符串的方法。 I found a few techniques on the internet, and decided to put two of them to the test, but the results were strange. 我在互联网上找到了一些技术,并决定将其中两个进行测试,但结果很奇怪。

I'm using Visual Studio Community 2019 (Version 16.0.3) on a Windows 10 laptop. 我在Windows 10笔记本电脑上使用Visual Studio Community 2019(版本16.0.3)。 The length of the file "my_text.txt" is 2,235,259 characters and it's 2.183 MB in size. 文件“ my_text.txt”的长度为2,235,259个字符,大小为2.183 MB。

Here is the complete code: 这是完整的代码:

#include <chrono>
#include <fstream>
#include <iostream>
#include <string>

// first technique
void read_string_1(std::ifstream& fstr, std::string& result)
{
    fstr.seekg(0, std::ios::end);
    size_t length = fstr.tellg();
    fstr.seekg(0);
    result = std::string(length + 1, '\0');
    fstr.read(&result[0], length);
}

// second technique
void read_string_2(std::ifstream& fstr, std::string& result)
{
    result = std::string( (std::istreambuf_iterator<char>(fstr)), (std::istreambuf_iterator<char>()) );
}

int main()
{
    std::ifstream ifile{ "my_text.txt", std::ios_base::binary };
    if (!ifile)
        throw std::runtime_error("Error!");

    std::string content;

    for (int i = 0; i < 10; ++i)
    {
        std::chrono::high_resolution_clock::time_point p1 = std::chrono::high_resolution_clock::now();
        read_string_1(ifile, content);
        std::chrono::high_resolution_clock::time_point p2 = std::chrono::high_resolution_clock::now();
        auto duration1 = std::chrono::duration_cast<std::chrono::microseconds>(p2 - p1).count();
        std::cout << "M1:" << duration1 << std::endl;
    }

    for (int i = 0; i < 10; ++i)
    {
        std::chrono::high_resolution_clock::time_point p3 = std::chrono::high_resolution_clock::now();
        read_string_2(ifile, content);
        std::chrono::high_resolution_clock::time_point p4 = std::chrono::high_resolution_clock::now();
        auto duration2 = std::chrono::duration_cast<std::chrono::microseconds>(p4 - p3).count();
        std::cout << "M2:" << duration2 << std::endl;
    }

    return 0;
}

And here are the results: 结果如下:

Case 1: call read_string_1() first, then call read_string_2(). 情况1:先调用read_string_1(),然后再调用read_string_2()。

M1:7389
M1:8821
M1:6303
M1:6725
M1:5951
M1:8097
M1:5651
M1:6156
M1:6110
M1:5848
M2:827
M2:15
M2:15
M2:15
M2:14
M2:13
M2:14
M2:13
M2:14
M2:14

Case 2: call read_string_2() first, then read_string_1(). 情况2:先调用read_string_2(),然后再调用read_string_1()。

M1:940311
M1:352
M1:16
M1:13
M1:15
M1:15
M1:13
M1:13
M1:14
M1:14
M2:4668
M2:4761
M2:4881
M2:7446
M2:5050
M2:5572
M2:5255
M2:5108
M2:5234
M2:5072

Of course the results differ each time, but they follow a general pattern. 当然,每次的结果都不同,但是它们遵循一般的模式。 As you can see, read_string_1() is pretty consistent, but the execution times of read_string_2() are puzzling. 如您所见,read_string_1()非常一致,但是read_string_2()的执行时间令人费解。 Why, in both cases, it gets faster with repeated execution? 为什么在两种情况下重复执行都会变得更快? Why, in case 2, it takes so long to execute in the first run? 在第2种情况下,为什么在第一次运行中要花这么长时间? What's happening in the background? 后台发生了什么事? Am I doing something wrong? 难道我做错了什么? And in the end, which function is faster, read_string_1() or read_string_2()? 最后,read_string_1()或read_string_2()哪个函数更快?

Execution becomes faster because of caching. 由于缓存,执行速度变得更快。

With seeking, it takes time going through the file. 搜索时,遍历文件需要时间。 So while some things are cached, the difference is not so big. 因此,虽然缓存了某些内容,但差异并不大。 With direct read, the file content itself can be cached. 通过直接读取,可以缓存文件内容本身。 So reading it again is just a pointer to cached memory. 因此,再次读取它只是指向缓存内存的指针。

How long it takes on first try depends on what's in the cache and on the operation itself. 第一次尝试需要花费多长时间,具体取决于缓存中的内容以及操作本身。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM