简体   繁体   English

在 ifstream::gcount() 的 position 之后文本仍然存在

[英]Text still exists after the position of ifstream::gcount()

I wrote a text file, then read the file to a string buffer larger than the text file.我写了一个文本文件,然后将文件读入一个比文本文件大的字符串缓冲区。

I thought there would be no text after the position of ifstream::gcount() because the buffer was initialized with \0 s.我认为在ifstream::gcount()的 position 之后不会有文本,因为缓冲区是用\0初始化的。

But there was text.但是文字。 How is this possible?这怎么可能?

example code:示例代码:

#include <iostream>
#include <string>
#include <fstream>

int main() {
    std::string path = "test.txt";

    // write to file
    std::ofstream out(path);    
    for (int i = 1; i <= 10'000; ++i) {
        std::string lineNum = std::to_string(i);
        out << lineNum + "xxxxxxxxxxxxxxx" + lineNum + "\n"; 
    } 
    out.close();

    // read from file
    std::ifstream in(path); 
    std::string buffer;
    int bufferSize = 1'000'000; 
    buffer.resize(bufferSize); 
    in.read(buffer.data(), buffer.size()); 

    auto gc = in.gcount(); 
    auto found = buffer.find('\n', gc); 
    std::string substr = buffer.substr(gc - 10, 100); 

    std::cout << "gcount: " << gc << '\n'; 
    std::cout << "found: " << found << '\n'; 
    std::cout << "npos?: " << std::boolalpha << (found == std::string::npos) << '\n'; 
    std::cout << "substr:\n" << substr << std::endl;    
}

result:结果:

gcount: 237788
found: 237810
npos?: false    // I thought `found` should be the same as `string::npos`.
substr:         
xxxx10000
01xxxxxxxxxxxxxxx9601     // I thought there should be no text after `gcount()`.
9602xxxxxxxxxxxxxxx9602
9603xxxxxxxxxxxxxxx9603
9604xxxxxxxxxxxxx

Executed with MSVC for 32bit, on Windows(x64).在 Windows(x64) 上使用 MSVC 执行 32 位。

PS Also tried building for 64bit, but the same result. PS 也尝试构建 64 位,但结果相同。
(used in.read(const_cast<char*>(buffer.data()), buffer.size()); instead of in.read(buffer.data(), buffer.size()); ) (使用in.read(const_cast<char*>(buffer.data()), buffer.size());而不是in.read(buffer.data(), buffer.size());

@john's comment saved me. @john 的评论救了我。

The culprit was how the different OSs interpret the new line character differently.罪魁祸首是不同的操作系统如何以不同的方式解释换行符。

If opened with a hex editor, we can see the difference.如果用十六进制编辑器打开,我们可以看到差异。

If run the code in the question in Windows, the file written by the code shows 0x0D 0x0A , which is simply \r\n .如果运行Windows中问题中的代码,代码写入的文件显示为0x0D 0x0A ,也就是\r\n But, in Unix or Unix-like OS such as Linux, it would be just 0x0A , which means \n .但是,在 Unix 或类似 Unix 的操作系统(例如 Linux)中,它只是0x0A ,这意味着\n

But, if we use the std::ios_base::binary option when using std::fstream , the OS will not interpret the newline character, but just use it "as-is".但是,如果我们在使用std::fstream时使用 std std::ios_base::binary选项,操作系统将不会解释换行符,而只是“按原样”使用它。 So, with that option, a hex editor would show only 0x0A regardless of OS.因此,使用该选项,无论操作系统如何,十六进制编辑器都只会显示0x0A

So using the ios_base::binary option with ofstream or ifstream or both get rid of the problem described in the question.因此,将ios_base::binary选项与ofstreamifstream或两者一起使用可以摆脱问题中描述的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM