简体   繁体   English

对`std :: istreambuf_iterator`的使用感到困惑

[英]Confused about usage of `std::istreambuf_iterator`

I've implemented a deserialization routine for an object using the << stream operator. 我已经使用<< stream运算符为对象实现了反序列化例程。 The routine itself uses an istreambuf_iterator<char> to extract characters from the stream one by one, in order to construct the object. 例程本身使用istreambuf_iterator<char>从流中提取字符,以构造对象。

Ultimately, my goal is to be able to iterate over a stream using an istream_iterator<MyObject> and insert each object into a vector . 最终,我的目标是能够使用istream_iterator<MyObject>迭代流并将每个对象插入到vector Pretty standard, except I'm having trouble getting the istream_iterator to stop iterating when it hits end-of-stream. 非常标准,除了我遇到istream_iterator在它到达流末尾时停止迭代时遇到了麻烦。 Right now, it just loops forever, even though calls to istream::tellg() indicate I'm at the end of the file. 现在,它只是永远循环,即使调用istream::tellg()表明我在文件的末尾。

Here's code to reproduce the problem: 这是重现问题的代码:

struct Foo
{
    Foo() { }    
    Foo(char a_, char b_) : a(a_), b(b_) { }

    char a;
    char b;
};

// Output stream operator
std::ostream& operator << (std::ostream& os, const Foo& f)
{
    os << f.a << f.b;
    return os;
}

// Input stream operator
std::istream& operator >> (std::istream& is, Foo& f)
{
    if (is.good()) 
    {
        std::istreambuf_iterator<char> it(is);
        std::istreambuf_iterator<char> end;

        if (it != end) {
            f.a = *it++;
            f.b = *it++;
        }
    }
    return is;
}

int main()
{
    {
        std::ofstream ofs("foo.txt");
        ofs << Foo('a', 'b') << Foo('c', 'd');
    }

    std::ifstream ifs("foo.txt");
    std::istream_iterator<Foo> it(ifs);
    std::istream_iterator<Foo> end;
    for (; it != end; ++it) cout << *it << endl; // iterates infinitely
}

I know in this trivial example I don't even need istreambuf_iterator, but I'm just trying to simplify the problem so it's more likely people will answer my question. 我知道在这个简单的例子中我甚至不需要istreambuf_iterator,但我只是想简化问题,所以人们更有可能回答我的问题。

So the problem here is that even though the istreambuf_iterator reaches the end of the stream buffer, the actual stream itself doesn't enter an EOF state. 所以这里的问题是即使istreambuf_iterator到达流缓冲区的末尾,实际流本身也不会进入EOF状态。 Calls to istream::eof() return false, even though istream::tellg() returns the last byte in the file, and istreambuf_iterator<char>(ifs) compares true to istreambuf_iterator<char>() , meaning I'm definitely at the end of the stream. 调用istream::eof()返回false,即使istream::tellg()返回文件中的最后一个字节, istreambuf_iterator<char>(ifs)将true与istreambuf_iterator<char>()进行比较,这意味着我肯定是在流的最后。

I looked at the IOstreams library code to see exactly how it's determining whether an istream_iterator is at the end position, and basically it checks if istream::operator void*() const evaluates to true . 我查看了IOstreams库代码,以确切了解它是如何确定istream_iterator是否位于结束位置,并且基本上它检查istream::operator void*() const是否计算为true This istream library function simply returns: 这个istream库函数只返回:

return this->fail() ? 0 : const_cast<basic_ios*>(this);

In other words, it returns 0 (false) if the failbit is set. 换句话说,如果设置了failbit,则返回0 (false)。 It then compares this value to the same value in a default-constructed instance of istream_iterator to determine if we're at the end. 然后,它将此值与istream_iterator的默认构造实例中的相同值进行比较,以确定我们是否在最后。

So I tried manually setting the failbit in my std::istream& operator >> (std::istream& is, Foo& f) routine when the istreambuf_iterator compares true to the end iterator. 因此std::istream& operator >> (std::istream& is, Foo& f)istreambuf_iterator将true与结束迭代器进行比较时,我尝试在std::istream& operator >> (std::istream& is, Foo& f)例程中手动设置failbit。 This worked perfectly, and properly terminated the loop. 这非常有效,并且正确地终止了循环。 But now I'm really confused. 但现在我真的很困惑。 It seems that istream_iterator definitely checks for std::ios::failbit in order to signify an "end-of-stream" condition. 似乎istream_iterator 肯定会检查std::ios::failbit以表示“流结束”条件。 But isn't that what std::ios::eofbit is for? 但这不是std::ios::eofbit的用途吗? I thought failbit was for error conditions, like for example if the underlying file of an fstream couldn't be opened or something. 我认为failbit是针对错误条件的,例如,如果无法打开fstream的基础文件或其他内容。

So, why do I need to call istream::setstate(std::ios::failbit) to get the loop to terminate? 那么,为什么我需要调用istream::setstate(std::ios::failbit)来使循环终止?

When you use istreambuf_iterator, you are manipulating the underlying streambuf object of the istream object. 使用istreambuf_iterator时,您正在操作istream对象的基础streambuf对象。 The streambuf object doesn't know anything about it's owner(the istream object), so calling functions on the streambuf object does not make changes to the istream object. streambuf对象对它的所有者(istream对象)一无所知,因此在streambuf对象上调用函数不会对istream对象进行更改。 That's why the flags in the istream object are not set when you reach the eof. 这就是当你到达eof时没有设置istream对象中的标志的原因。

Do something like this: 做这样的事情:

std::istream& operator >> (std::istream& is, Foo& f)
{
    is.read(&f.a, sizeof(f.a));
    is.read(&f.b, sizeof(f.b));
    return is;
}

Edit 编辑

I was stepping through the code in my debugger, and this is what I found. 我在调试器中单步执行代码,这就是我找到的。 istream_iterator has two internal data members. istream_iterator有两个内部数据成员。 A pointer to the associated istream object, and an object of the template type (Foo in this case). 指向关联的istream对象的指针,以及模板类型的对象(在本例中为Foo)。 When you call ++it, it calls this function: 当你调用++它时,它会调用这个函数:

void _Getval()
{    // get a _Ty value if possible
    if (_Myistr != 0 && !(*_Myistr >> _Myval))
        _Myistr = 0;
}

_Myistr is the istream pointer, and _Myval is the Foo object. _Myistr是istream指针,_Myval是Foo对象。 If you look here: 如果你看这里:

!(*_Myistr >> _Myval)

That's where it calls your operator>> overload. 这就是它所谓的操作员>>过载。 And it calls operator! 它叫操作员! on the returned istream object. 在返回的istream对象上。 And as you can see here , operator! 正如你在这里看到的,运营商! only returns true if failbit or badbit are set, eofbit doesn't do it. 如果设置了failbit或badbit,则只返回true,eofbit不会这样做。

So, what happens next, if either failbit or badbit are set, the istream pointer gets NULL'd. 那么,接下来会发生什么,如果设置了failbit或badbit,则istream指针变为NULL。 And the next time you compare the iterator to the end iterator, it compares the istream pointer, which is NULL on both of them. 下次将迭代器与结束迭代器进行比较时,它会比较istream指针,它们都是NULL。

Your outer loop—where you're checking for your istream_iterator to have reached its end—is tied to state stored in the istream 's inherited ios_base . 您正在检查istream_iterator到达其结尾的外部循环与存储在istream的继承ios_base The state on the istream represents the outcome of recent extraction operations performed against the istream itself , not the state of its underlying streambuf . istream上的状态表示最近针对istream本身执行的提取操作的结果,而不是其底层streambuf的状态。

Your inner loop—where you're using istreambuf_iterator to extract characters from the streambuf —is using lower-level functions like basic_streambuf::sgetc() (for operator* ) and basic_streambuf::sbumpc() (for operator++ ). 您的内部循环 - 您正在使用istreambuf_iteratorstreambuf -is中提取字符,使用较低级别的函数,如basic_streambuf::sgetc() (对于operator* )和basic_streambuf::sbumpc() (对于operator++ )。 Neither of those functions set state flags as a side effect, apart from the second one advancing basic_streambuf::gptr . 除了第二个提升basic_streambuf::gptr之外,这两个函数都没有将状态标志设置为basic_streambuf::gptr

Your inner loop works fine, but it's implemented in a sneaky way packaged as it is, and it violates the contract of std::basic_istream& operator>>(std::basic_istream&, T&) . 你的内部循环工作正常,但它以一种偷偷摸摸的方式实现打包,它违反std::basic_istream& operator>>(std::basic_istream&, T&) If the function fails to extract an element as intended, it must call basic_ios::setstate(badbit) and, if it also encountered end-of-stream while extracting, it must also call basic_ios::setstate(eofbit) . 如果函数无法按预期提取元素,则必须调用basic_ios::setstate(badbit) ,如果在提取时遇到流末尾,则还必须调用basic_ios::setstate(eofbit) Your extractor function sets neither flag when it fails to extract a Foo . 当提取器函数无法提取Foo时,你的提取器函数既不设置标志。

I concur with the other advice here to avoid use of istreambuf_iterator for implementing an extraction operator meant to work at the istream level. 我同意这里的其他建议,以避免使用istreambuf_iterator来实现一个旨在在istream级别工作的提取运算符。 You're forcing yourself to do extra work to maintain the istream contract, which will cause other downstream surprises like the one that brought you here. 你强迫自己做额外的工作来维持istream合同,这将导致其他下游的惊喜,比如带你到这里的那个。

In your operator>> you should set failbit any time you fail to successfully read a Foo . 在您的operator>>您应该在无法成功读取Foo时设置failbit Additionally you should set eofbit any time you detect end of file. 此外,您应该在检测到文件结束时设置eofbit This could look like this: 这看起来像这样:

// Input stream operator
std::istream& operator >> (std::istream& is, Foo& f)
{
    if (is.good()) 
    {
        std::istreambuf_iterator<char> it(is);
        std::istreambuf_iterator<char> end;

        std::ios_base::iostate err = it == end ? (std::ios_base::eofbit |
                                                  std::ios_base::failbit) :
                                                 std::ios_base::goodbit;
        if (err == std::ios_base::goodbit) {
            char a = *it;
            if (++it != end)
            {
                char b = *it;
                if (++it == end)
                    err = std::ios_base::eofbit;
                f.a = a;
                f.b = b;
            }
            else
                err = std::ios_base::eofbit | std::ios_base::failbit;
        }
        if (err)
            is.setstate(err);
    }
    else
        is.setstate(std::ios_base::failbit);
    return is;
}

With this extractor, which sets failbit on failure to read, and eofbit on detecting eof of file, your driver works as expected. 有了这个提取器,它设置了无法读取的failbit,并且在检测文件的eofbit时,你的驱动程序按预期工作。 Note especially that even if your outer if (is.good()) fails, you still need to set failbit . 请特别注意,即使你的外部if (is.good())失败,你仍然需要设置failbit Your stream might be !good() because only eofbit is set. 你的流可能是!good()因为只设置了eofbit

You can slightly simplify the above by using an istream::sentry for the outer test. 您可以通过使用istream::sentry进行外部测试来略微简化上述操作。 If the sentry fails, it will set failbit for you: 如果sentry失败,它会为你设置failbit

// Input stream operator
std::istream& operator >> (std::istream& is, Foo& f)
{
    std::istream::sentry ok(is);
    if (ok) 
    {
        std::istreambuf_iterator<char> it(is);
        std::istreambuf_iterator<char> end;

        std::ios_base::iostate err = it == end ? (std::ios_base::eofbit |
                                                  std::ios_base::failbit) :
                                                 std::ios_base::goodbit;
        if (err == std::ios_base::goodbit) {
            char a = *it;
            if (++it != end)
            {
                char b = *it;
                if (++it == end)
                    err = std::ios_base::eofbit;
                f.a = a;
                f.b = b;
            }
            else
                err = std::ios_base::eofbit | std::ios_base::failbit;
        }
        if (err)
            is.setstate(err);
    }
    return is;
}

The sentry also skips leading whitespace. sentry也跳过领先的空白。 This may or may not be what you want. 这可能是也可能不是你想要的。 If you don't want the sentry to skip leading whitespace you can construct it with: 如果您不希望哨兵跳过前导空格,您可以使用以下内容构建它:

    std::istream::sentry ok(is, true);

If the sentry detects end of file while skipping leading whitespace, it will set both failbit and eofbit . 如果sentry在跳过前导空格时检测到文件结束,则会设置failbiteofbit

It looks like the two sets of stream iterators are interfearing with each other: 看起来两组流迭代器互相干扰:

I got it working with this: 我得到了它:

// Input stream operator
std::istream& operator >> (std::istream& is, Foo& f)
{
    f.a = is.get();
    f.b = is.get();

    return is;
}

I think your end condition needs to use the .equal() method, instead of using the comparison operator. 我认为你的结束条件需要使用.equal()方法,而不是使用比较运算符。

for (; !it.equal(end); ++it) cout << *it << endl;

I usually see this implemented with a while loop instead of a for loop: 我通常看到这是用while循环而不是for循环实现的:

while ( !it.equal(end)) {
    cout << *it++ << endl;
}

I think those two will have the same effect, but (to me) the while loop is clearer. 我认为这两个会产生相同的效果,但(对我而言)while循环更清晰。

Note: You have a number of other spots where you're using the comparison operator to check whether an iterator is at eof. 注意:您有许多其他位置,您正在使用比较运算符来检查迭代器是否处于eof。 All of these should probably be switched to use .equal() . 所有这些都应该切换为使用.equal()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM