简体   繁体   English

在 ifstream::gcount() 的 position 之后文本仍然存在

[英]Text still exists after the position of ifstream::gcount()

I wrote a text file, then read the file to a string buffer larger than the text file.我写了一个文本文件,然后将文件读入一个比文本文件大的字符串缓冲区。

I thought there would be no text after the position of ifstream::gcount() because the buffer was initialized with \0 s.我认为在ifstream::gcount()的 position 之后不会有文本,因为缓冲区是用\0初始化的。

But there was text.但是文字。 How is this possible?这怎么可能?

example code:示例代码:

#include <iostream>
#include <string>
#include <fstream>

int main() {
    std::string path = "test.txt";

    // write to file
    std::ofstream out(path);    
    for (int i = 1; i <= 10'000; ++i) {
        std::string lineNum = std::to_string(i);
        out << lineNum + "xxxxxxxxxxxxxxx" + lineNum + "\n"; 

    // read from file
    std::ifstream in(path); 
    std::string buffer;
    int bufferSize = 1'000'000; 
    in.read(buffer.data(), buffer.size()); 

    auto gc = in.gcount(); 
    auto found = buffer.find('\n', gc); 
    std::string substr = buffer.substr(gc - 10, 100); 

    std::cout << "gcount: " << gc << '\n'; 
    std::cout << "found: " << found << '\n'; 
    std::cout << "npos?: " << std::boolalpha << (found == std::string::npos) << '\n'; 
    std::cout << "substr:\n" << substr << std::endl;    


gcount: 237788
found: 237810
npos?: false    // I thought `found` should be the same as `string::npos`.
01xxxxxxxxxxxxxxx9601     // I thought there should be no text after `gcount()`.

Executed with MSVC for 32bit, on Windows(x64).在 Windows(x64) 上使用 MSVC 执行 32 位。

PS Also tried building for 64bit, but the same result. PS 也尝试构建 64 位,但结果相同。
(used in.read(const_cast<char*>(buffer.data()), buffer.size()); instead of in.read(buffer.data(), buffer.size()); ) (使用in.read(const_cast<char*>(buffer.data()), buffer.size());而不是in.read(buffer.data(), buffer.size());

@john's comment saved me. @john 的评论救了我。

The culprit was how the different OSs interpret the new line character differently.罪魁祸首是不同的操作系统如何以不同的方式解释换行符。

If opened with a hex editor, we can see the difference.如果用十六进制编辑器打开,我们可以看到差异。

If run the code in the question in Windows, the file written by the code shows 0x0D 0x0A , which is simply \r\n .如果运行Windows中问题中的代码,代码写入的文件显示为0x0D 0x0A ,也就是\r\n But, in Unix or Unix-like OS such as Linux, it would be just 0x0A , which means \n .但是,在 Unix 或类似 Unix 的操作系统(例如 Linux)中,它只是0x0A ,这意味着\n

But, if we use the std::ios_base::binary option when using std::fstream , the OS will not interpret the newline character, but just use it "as-is".但是,如果我们在使用std::fstream时使用 std std::ios_base::binary选项,操作系统将不会解释换行符,而只是“按原样”使用它。 So, with that option, a hex editor would show only 0x0A regardless of OS.因此,使用该选项,无论操作系统如何,十六进制编辑器都只会显示0x0A

So using the ios_base::binary option with ofstream or ifstream or both get rid of the problem described in the question.因此,将ios_base::binary选项与ofstreamifstream或两者一起使用可以摆脱问题中描述的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM