简体   繁体   English

一次读取一个文件时出现一个错误

[英]Off-by-one errors when reading a file one chunk at a time

ifstream::readsome is notoriously bad for reading chunks of a file due to implementation-defined issues.由于实现定义的问题, ifstream::readsome对于读取文件块是出了名的糟糕。 In my case, MSVC returns 0 on a freshly opened file.就我而言,MSVC 在新打开的文件上返回 0。

I looked at their implementation and realized they're just calling ifstream::read under the hood:我查看了他们的实现并意识到他们只是在后台调用ifstream::read

MSVC Implementation MSVC 实施

streamsize __CLR_OR_THIS_CALL readsome(_Elem* _Str,
    streamsize _Count) { // read up to _Count characters into buffer, without blocking
    ios_base::iostate _State = ios_base::goodbit;
    _Chcount                 = 0;
    const sentry _Ok(*this, true);
    streamsize _Num;

    if (!_Ok) {
        _State |= ios_base::failbit; // no buffer, fail
    } else if ((_Num = _Myios::rdbuf()->in_avail()) < 0) {
        _State |= ios_base::eofbit; // no characters available
    } else if (0 < _Count && 0 < _Num) { // read available
        read(_Str, _Num < _Count ? _Num : _Count);
    }

    _Myios::setstate(_State);
    return gcount();
}

So I implemented my own that just calls ifstream::read :所以我实现了我自己的,只调用ifstream::read

My Implementation我的实现

std::optional<std::string> ReadSomeStringFromFile(std::ifstream& ifs, std::streampos pos, std::streamsize count) noexcept {
    if(ifs && ifs.is_open()) {
        auto result = std::string(count, '\0');
        ifs.seekg(pos);
        ifs.read(result.data(), count);
        if(ifs.gcount()) {
            return result;
        }
    }
    return {};
}

Usage:用法:

std::streampos pos{0};
std::streamsize count{10};
std::ifstream ifs{g_options_filepath};
{
    auto stream = FileUtils::ReadSomeStringFromFile(ifs, pos, count);
    while(stream.has_value()) {
        DebuggerPrintf(stream.value().c_str());
        pos += count;
        stream = FileUtils::ReadSomeStringFromFile(ifs, pos, count);
    }
}

This works fine for binary files (I have a separate function for that), but for the string version where I need to preserve the newline characters produces an off-by-one error if the chunk contains a newline character.这适用于二进制文件(我有一个单独的 function ),但对于我需要保留换行符的字符串版本,如果块包含换行符,则会产生一个错误。 This causes the last character in the chunk to be duplicated as the first character in the next:这会导致块中的最后一个字符被复制为下一个字符中的第一个字符:

Expected output预期 output

difficulty=Normal
controlpref=Mouse
sound=5
music=5
cameraShakeStrength=1.000000

Actual output实际 output

difficulty=Normal
coontrolpref=Mouse
souund=5
musiic=5
camerraShakeStrength=1.000000

Using formatted ifstream::get by default uses the newline as a delimiter and skips it entirely (again, the newlines need to be preserved) and causes interleaved output and dropped characters:使用格式化的ifstream::get默认使用换行符作为分隔符并完全跳过它(同样,需要保留换行符)并导致交错的 output 和丢弃的字符:

difficult=Normalontrolpre=Mouseund=5ic=5raShakeStength=1.00000

Question问题

Is there a way around trying to use unformatted input functions on formatted data or should I just not try this with text data?有没有办法尝试在格式化数据上使用未格式化的输入函数,或者我不应该尝试使用文本数据?

I don't use get very often so I forgot it existed.我不经常使用get ,所以我忘记了它的存在。 Using this answer as a guide I came up with a solution:使用这个答案作为指导,我想出了一个解决方案:

(I double-checked, the other answer that uses FormattedInput as (ifs >> std::noskipws >> ch) gives the same results even though the get spec says it treats it as UnformattedInput ) (我仔细检查过,使用FormattedInput as (ifs >> std::noskipws >> ch)的另一个答案给出了相同的结果,即使get规范说它将它视为UnformattedInput

[[nodiscard]] std::optional<std::string> ReadSomeStringBufferFromFile(std::ifstream& ifs, std::streampos pos, std::streamsize count /*= 0u*/) noexcept {
    if(!(ifs && ifs.is_open())) {
        return {};
    }
    ifs.seekg(pos, std::ios::beg);

    //Would like to early-out here,
    //but MSVC ifstream::seekg doesn't set the fail bit,
    //so can't early-out until the get call.

    char ch{};
    std::string result{};
    result.reserve(count);
    bool readsome{false}; //If nothing read, make std::optional::has_value false.
    while(ifs && ifs.get(ch) && count > 0) {
        result.append(1, ch);
        --count;
        readsome |= true;
    }
    return readsome ? std::make_optional(result) : std::nullopt;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM