简体   繁体   中英

Confused about usage of `std::istreambuf_iterator`

I've implemented a deserialization routine for an object using the << stream operator. The routine itself uses an istreambuf_iterator<char> to extract characters from the stream one by one, in order to construct the object.

Ultimately, my goal is to be able to iterate over a stream using an istream_iterator<MyObject> and insert each object into a vector . Pretty standard, except I'm having trouble getting the istream_iterator to stop iterating when it hits end-of-stream. Right now, it just loops forever, even though calls to istream::tellg() indicate I'm at the end of the file.

Here's code to reproduce the problem:

struct Foo
{
    Foo() { }    
    Foo(char a_, char b_) : a(a_), b(b_) { }

    char a;
    char b;
};

// Output stream operator
std::ostream& operator << (std::ostream& os, const Foo& f)
{
    os << f.a << f.b;
    return os;
}

// Input stream operator
std::istream& operator >> (std::istream& is, Foo& f)
{
    if (is.good()) 
    {
        std::istreambuf_iterator<char> it(is);
        std::istreambuf_iterator<char> end;

        if (it != end) {
            f.a = *it++;
            f.b = *it++;
        }
    }
    return is;
}

int main()
{
    {
        std::ofstream ofs("foo.txt");
        ofs << Foo('a', 'b') << Foo('c', 'd');
    }

    std::ifstream ifs("foo.txt");
    std::istream_iterator<Foo> it(ifs);
    std::istream_iterator<Foo> end;
    for (; it != end; ++it) cout << *it << endl; // iterates infinitely
}

I know in this trivial example I don't even need istreambuf_iterator, but I'm just trying to simplify the problem so it's more likely people will answer my question.

So the problem here is that even though the istreambuf_iterator reaches the end of the stream buffer, the actual stream itself doesn't enter an EOF state. Calls to istream::eof() return false, even though istream::tellg() returns the last byte in the file, and istreambuf_iterator<char>(ifs) compares true to istreambuf_iterator<char>() , meaning I'm definitely at the end of the stream.

I looked at the IOstreams library code to see exactly how it's determining whether an istream_iterator is at the end position, and basically it checks if istream::operator void*() const evaluates to true . This istream library function simply returns:

return this->fail() ? 0 : const_cast<basic_ios*>(this);

In other words, it returns 0 (false) if the failbit is set. It then compares this value to the same value in a default-constructed instance of istream_iterator to determine if we're at the end.

So I tried manually setting the failbit in my std::istream& operator >> (std::istream& is, Foo& f) routine when the istreambuf_iterator compares true to the end iterator. This worked perfectly, and properly terminated the loop. But now I'm really confused. It seems that istream_iterator definitely checks for std::ios::failbit in order to signify an "end-of-stream" condition. But isn't that what std::ios::eofbit is for? I thought failbit was for error conditions, like for example if the underlying file of an fstream couldn't be opened or something.

So, why do I need to call istream::setstate(std::ios::failbit) to get the loop to terminate?

When you use istreambuf_iterator, you are manipulating the underlying streambuf object of the istream object. The streambuf object doesn't know anything about it's owner(the istream object), so calling functions on the streambuf object does not make changes to the istream object. That's why the flags in the istream object are not set when you reach the eof.

Do something like this:

std::istream& operator >> (std::istream& is, Foo& f)
{
    is.read(&f.a, sizeof(f.a));
    is.read(&f.b, sizeof(f.b));
    return is;
}

Edit

I was stepping through the code in my debugger, and this is what I found. istream_iterator has two internal data members. A pointer to the associated istream object, and an object of the template type (Foo in this case). When you call ++it, it calls this function:

void _Getval()
{    // get a _Ty value if possible
    if (_Myistr != 0 && !(*_Myistr >> _Myval))
        _Myistr = 0;
}

_Myistr is the istream pointer, and _Myval is the Foo object. If you look here:

!(*_Myistr >> _Myval)

That's where it calls your operator>> overload. And it calls operator! on the returned istream object. And as you can see here , operator! only returns true if failbit or badbit are set, eofbit doesn't do it.

So, what happens next, if either failbit or badbit are set, the istream pointer gets NULL'd. And the next time you compare the iterator to the end iterator, it compares the istream pointer, which is NULL on both of them.

Your outer loop—where you're checking for your istream_iterator to have reached its end—is tied to state stored in the istream 's inherited ios_base . The state on the istream represents the outcome of recent extraction operations performed against the istream itself , not the state of its underlying streambuf .

Your inner loop—where you're using istreambuf_iterator to extract characters from the streambuf —is using lower-level functions like basic_streambuf::sgetc() (for operator* ) and basic_streambuf::sbumpc() (for operator++ ). Neither of those functions set state flags as a side effect, apart from the second one advancing basic_streambuf::gptr .

Your inner loop works fine, but it's implemented in a sneaky way packaged as it is, and it violates the contract of std::basic_istream& operator>>(std::basic_istream&, T&) . If the function fails to extract an element as intended, it must call basic_ios::setstate(badbit) and, if it also encountered end-of-stream while extracting, it must also call basic_ios::setstate(eofbit) . Your extractor function sets neither flag when it fails to extract a Foo .

I concur with the other advice here to avoid use of istreambuf_iterator for implementing an extraction operator meant to work at the istream level. You're forcing yourself to do extra work to maintain the istream contract, which will cause other downstream surprises like the one that brought you here.

In your operator>> you should set failbit any time you fail to successfully read a Foo . Additionally you should set eofbit any time you detect end of file. This could look like this:

// Input stream operator
std::istream& operator >> (std::istream& is, Foo& f)
{
    if (is.good()) 
    {
        std::istreambuf_iterator<char> it(is);
        std::istreambuf_iterator<char> end;

        std::ios_base::iostate err = it == end ? (std::ios_base::eofbit |
                                                  std::ios_base::failbit) :
                                                 std::ios_base::goodbit;
        if (err == std::ios_base::goodbit) {
            char a = *it;
            if (++it != end)
            {
                char b = *it;
                if (++it == end)
                    err = std::ios_base::eofbit;
                f.a = a;
                f.b = b;
            }
            else
                err = std::ios_base::eofbit | std::ios_base::failbit;
        }
        if (err)
            is.setstate(err);
    }
    else
        is.setstate(std::ios_base::failbit);
    return is;
}

With this extractor, which sets failbit on failure to read, and eofbit on detecting eof of file, your driver works as expected. Note especially that even if your outer if (is.good()) fails, you still need to set failbit . Your stream might be !good() because only eofbit is set.

You can slightly simplify the above by using an istream::sentry for the outer test. If the sentry fails, it will set failbit for you:

// Input stream operator
std::istream& operator >> (std::istream& is, Foo& f)
{
    std::istream::sentry ok(is);
    if (ok) 
    {
        std::istreambuf_iterator<char> it(is);
        std::istreambuf_iterator<char> end;

        std::ios_base::iostate err = it == end ? (std::ios_base::eofbit |
                                                  std::ios_base::failbit) :
                                                 std::ios_base::goodbit;
        if (err == std::ios_base::goodbit) {
            char a = *it;
            if (++it != end)
            {
                char b = *it;
                if (++it == end)
                    err = std::ios_base::eofbit;
                f.a = a;
                f.b = b;
            }
            else
                err = std::ios_base::eofbit | std::ios_base::failbit;
        }
        if (err)
            is.setstate(err);
    }
    return is;
}

The sentry also skips leading whitespace. This may or may not be what you want. If you don't want the sentry to skip leading whitespace you can construct it with:

    std::istream::sentry ok(is, true);

If the sentry detects end of file while skipping leading whitespace, it will set both failbit and eofbit .

It looks like the two sets of stream iterators are interfearing with each other:

I got it working with this:

// Input stream operator
std::istream& operator >> (std::istream& is, Foo& f)
{
    f.a = is.get();
    f.b = is.get();

    return is;
}

I think your end condition needs to use the .equal() method, instead of using the comparison operator.

for (; !it.equal(end); ++it) cout << *it << endl;

I usually see this implemented with a while loop instead of a for loop:

while ( !it.equal(end)) {
    cout << *it++ << endl;
}

I think those two will have the same effect, but (to me) the while loop is clearer.

Note: You have a number of other spots where you're using the comparison operator to check whether an iterator is at eof. All of these should probably be switched to use .equal() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM