[英]Off-by-one errors when reading a file one chunk at a time
ifstream::readsome
is notoriously bad for reading chunks of a file due to implementation-defined issues.由于实现定义的问题, ifstream::readsome
对于读取文件块是出了名的糟糕。 In my case, MSVC returns 0 on a freshly opened file.就我而言,MSVC 在新打开的文件上返回 0。
I looked at their implementation and realized they're just calling ifstream::read
under the hood:我查看了他们的实现并意识到他们只是在后台调用ifstream::read
:
MSVC Implementation MSVC 实施
streamsize __CLR_OR_THIS_CALL readsome(_Elem* _Str,
streamsize _Count) { // read up to _Count characters into buffer, without blocking
ios_base::iostate _State = ios_base::goodbit;
_Chcount = 0;
const sentry _Ok(*this, true);
streamsize _Num;
if (!_Ok) {
_State |= ios_base::failbit; // no buffer, fail
} else if ((_Num = _Myios::rdbuf()->in_avail()) < 0) {
_State |= ios_base::eofbit; // no characters available
} else if (0 < _Count && 0 < _Num) { // read available
read(_Str, _Num < _Count ? _Num : _Count);
}
_Myios::setstate(_State);
return gcount();
}
So I implemented my own that just calls ifstream::read
:所以我实现了我自己的,只调用ifstream::read
:
My Implementation我的实现
std::optional<std::string> ReadSomeStringFromFile(std::ifstream& ifs, std::streampos pos, std::streamsize count) noexcept {
if(ifs && ifs.is_open()) {
auto result = std::string(count, '\0');
ifs.seekg(pos);
ifs.read(result.data(), count);
if(ifs.gcount()) {
return result;
}
}
return {};
}
Usage:用法:
std::streampos pos{0};
std::streamsize count{10};
std::ifstream ifs{g_options_filepath};
{
auto stream = FileUtils::ReadSomeStringFromFile(ifs, pos, count);
while(stream.has_value()) {
DebuggerPrintf(stream.value().c_str());
pos += count;
stream = FileUtils::ReadSomeStringFromFile(ifs, pos, count);
}
}
This works fine for binary files (I have a separate function for that), but for the string version where I need to preserve the newline characters produces an off-by-one error if the chunk contains a newline character.这适用于二进制文件(我有一个单独的 function ),但对于我需要保留换行符的字符串版本,如果块包含换行符,则会产生一个错误。 This causes the last character in the chunk to be duplicated as the first character in the next:这会导致块中的最后一个字符被复制为下一个字符中的第一个字符:
Expected output预期 output
difficulty=Normal
controlpref=Mouse
sound=5
music=5
cameraShakeStrength=1.000000
Actual output实际 output
difficulty=Normal
coontrolpref=Mouse
souund=5
musiic=5
camerraShakeStrength=1.000000
Using formatted ifstream::get
by default uses the newline as a delimiter and skips it entirely (again, the newlines need to be preserved) and causes interleaved output and dropped characters:使用格式化的ifstream::get
默认使用换行符作为分隔符并完全跳过它(同样,需要保留换行符)并导致交错的 output 和丢弃的字符:
difficult=Normalontrolpre=Mouseund=5ic=5raShakeStength=1.00000
Question问题
Is there a way around trying to use unformatted input functions on formatted data or should I just not try this with text data?有没有办法尝试在格式化数据上使用未格式化的输入函数,或者我不应该尝试使用文本数据?
I don't use get
very often so I forgot it existed.我不经常使用get
,所以我忘记了它的存在。 Using this answer as a guide I came up with a solution:使用这个答案作为指导,我想出了一个解决方案:
(I double-checked, the other answer that uses FormattedInput
as (ifs >> std::noskipws >> ch)
gives the same results even though the get
spec says it treats it as UnformattedInput
) (我仔细检查过,使用FormattedInput
as (ifs >> std::noskipws >> ch)
的另一个答案给出了相同的结果,即使get
规范说它将它视为UnformattedInput
)
[[nodiscard]] std::optional<std::string> ReadSomeStringBufferFromFile(std::ifstream& ifs, std::streampos pos, std::streamsize count /*= 0u*/) noexcept {
if(!(ifs && ifs.is_open())) {
return {};
}
ifs.seekg(pos, std::ios::beg);
//Would like to early-out here,
//but MSVC ifstream::seekg doesn't set the fail bit,
//so can't early-out until the get call.
char ch{};
std::string result{};
result.reserve(count);
bool readsome{false}; //If nothing read, make std::optional::has_value false.
while(ifs && ifs.get(ch) && count > 0) {
result.append(1, ch);
--count;
readsome |= true;
}
return readsome ? std::make_optional(result) : std::nullopt;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.