[英]Text still exists after the position of ifstream::gcount()
I wrote a text file, then read the file to a string buffer larger than the text file.我写了一个文本文件,然后将文件读入一个比文本文件大的字符串缓冲区。
I thought there would be no text after the position of ifstream::gcount()
because the buffer was initialized with \0
s.我认为在
ifstream::gcount()
的 position 之后不会有文本,因为缓冲区是用\0
初始化的。
But there was text.但是有文字。 How is this possible?
这怎么可能?
example code:示例代码:
#include <iostream>
#include <string>
#include <fstream>
int main() {
std::string path = "test.txt";
// write to file
std::ofstream out(path);
for (int i = 1; i <= 10'000; ++i) {
std::string lineNum = std::to_string(i);
out << lineNum + "xxxxxxxxxxxxxxx" + lineNum + "\n";
}
out.close();
// read from file
std::ifstream in(path);
std::string buffer;
int bufferSize = 1'000'000;
buffer.resize(bufferSize);
in.read(buffer.data(), buffer.size());
auto gc = in.gcount();
auto found = buffer.find('\n', gc);
std::string substr = buffer.substr(gc - 10, 100);
std::cout << "gcount: " << gc << '\n';
std::cout << "found: " << found << '\n';
std::cout << "npos?: " << std::boolalpha << (found == std::string::npos) << '\n';
std::cout << "substr:\n" << substr << std::endl;
}
result:结果:
gcount: 237788
found: 237810
npos?: false // I thought `found` should be the same as `string::npos`.
substr:
xxxx10000
01xxxxxxxxxxxxxxx9601 // I thought there should be no text after `gcount()`.
9602xxxxxxxxxxxxxxx9602
9603xxxxxxxxxxxxxxx9603
9604xxxxxxxxxxxxx
Executed with MSVC for 32bit, on Windows(x64).在 Windows(x64) 上使用 MSVC 执行 32 位。
PS Also tried building for 64bit, but the same result. PS 也尝试构建 64 位,但结果相同。
(used in.read(const_cast<char*>(buffer.data()), buffer.size());
instead of in.read(buffer.data(), buffer.size());
) (使用
in.read(const_cast<char*>(buffer.data()), buffer.size());
而不是in.read(buffer.data(), buffer.size());
)
@john's comment saved me. @john 的评论救了我。
The culprit was how the different OSs interpret the new line character differently.罪魁祸首是不同的操作系统如何以不同的方式解释换行符。
If opened with a hex editor, we can see the difference.如果用十六进制编辑器打开,我们可以看到差异。
If run the code in the question in Windows, the file written by the code shows 0x0D 0x0A
, which is simply \r\n
.如果运行Windows中问题中的代码,代码写入的文件显示为
0x0D 0x0A
,也就是\r\n
。 But, in Unix or Unix-like OS such as Linux, it would be just 0x0A
, which means \n
.但是,在 Unix 或类似 Unix 的操作系统(例如 Linux)中,它只是
0x0A
,这意味着\n
。
But, if we use the std::ios_base::binary
option when using std::fstream
, the OS will not interpret the newline character, but just use it "as-is".但是,如果我们在使用
std::fstream
时使用 std std::ios_base::binary
选项,操作系统将不会解释换行符,而只是“按原样”使用它。 So, with that option, a hex editor would show only 0x0A
regardless of OS.因此,使用该选项,无论操作系统如何,十六进制编辑器都只会显示
0x0A
。
So using the ios_base::binary
option with ofstream
or ifstream
or both get rid of the problem described in the question.因此,将
ios_base::binary
选项与ofstream
或ifstream
或两者一起使用可以摆脱问题中描述的问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.