简体   繁体   中英

Off-by-one errors when reading a file one chunk at a time

ifstream::readsome is notoriously bad for reading chunks of a file due to implementation-defined issues. In my case, MSVC returns 0 on a freshly opened file.

I looked at their implementation and realized they're just calling ifstream::read under the hood:

MSVC Implementation

streamsize __CLR_OR_THIS_CALL readsome(_Elem* _Str,
    streamsize _Count) { // read up to _Count characters into buffer, without blocking
    ios_base::iostate _State = ios_base::goodbit;
    _Chcount                 = 0;
    const sentry _Ok(*this, true);
    streamsize _Num;

    if (!_Ok) {
        _State |= ios_base::failbit; // no buffer, fail
    } else if ((_Num = _Myios::rdbuf()->in_avail()) < 0) {
        _State |= ios_base::eofbit; // no characters available
    } else if (0 < _Count && 0 < _Num) { // read available
        read(_Str, _Num < _Count ? _Num : _Count);
    }

    _Myios::setstate(_State);
    return gcount();
}

So I implemented my own that just calls ifstream::read :

My Implementation

std::optional<std::string> ReadSomeStringFromFile(std::ifstream& ifs, std::streampos pos, std::streamsize count) noexcept {
    if(ifs && ifs.is_open()) {
        auto result = std::string(count, '\0');
        ifs.seekg(pos);
        ifs.read(result.data(), count);
        if(ifs.gcount()) {
            return result;
        }
    }
    return {};
}

Usage:

std::streampos pos{0};
std::streamsize count{10};
std::ifstream ifs{g_options_filepath};
{
    auto stream = FileUtils::ReadSomeStringFromFile(ifs, pos, count);
    while(stream.has_value()) {
        DebuggerPrintf(stream.value().c_str());
        pos += count;
        stream = FileUtils::ReadSomeStringFromFile(ifs, pos, count);
    }
}

This works fine for binary files (I have a separate function for that), but for the string version where I need to preserve the newline characters produces an off-by-one error if the chunk contains a newline character. This causes the last character in the chunk to be duplicated as the first character in the next:

Expected output

difficulty=Normal
controlpref=Mouse
sound=5
music=5
cameraShakeStrength=1.000000

Actual output

difficulty=Normal
coontrolpref=Mouse
souund=5
musiic=5
camerraShakeStrength=1.000000

Using formatted ifstream::get by default uses the newline as a delimiter and skips it entirely (again, the newlines need to be preserved) and causes interleaved output and dropped characters:

difficult=Normalontrolpre=Mouseund=5ic=5raShakeStength=1.00000

Question

Is there a way around trying to use unformatted input functions on formatted data or should I just not try this with text data?

I don't use get very often so I forgot it existed. Using this answer as a guide I came up with a solution:

(I double-checked, the other answer that uses FormattedInput as (ifs >> std::noskipws >> ch) gives the same results even though the get spec says it treats it as UnformattedInput )

[[nodiscard]] std::optional<std::string> ReadSomeStringBufferFromFile(std::ifstream& ifs, std::streampos pos, std::streamsize count /*= 0u*/) noexcept {
    if(!(ifs && ifs.is_open())) {
        return {};
    }
    ifs.seekg(pos, std::ios::beg);

    //Would like to early-out here,
    //but MSVC ifstream::seekg doesn't set the fail bit,
    //so can't early-out until the get call.

    char ch{};
    std::string result{};
    result.reserve(count);
    bool readsome{false}; //If nothing read, make std::optional::has_value false.
    while(ifs && ifs.get(ch) && count > 0) {
        result.append(1, ch);
        --count;
        readsome |= true;
    }
    return readsome ? std::make_optional(result) : std::nullopt;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM