简体   繁体   English

用setw阅读:eof还是不eof?

[英]Reading with setw: to eof or not to eof?

Consider the following simple example 请考虑以下简单示例

#include <string>
#include <sstream>
#include <iomanip>

using namespace std;

int main() {
  string str = "string";
  istringstream is(str);
  is >> setw(6) >> str;
  return is.eof();
}

At the first sight, since the explicit width is specified by the setw manipulator, I'd expect the >> operator to finish reading the string after successfully extracting the requested number of characters from the input stream. 乍一看,由于显式宽度由setw操纵器指定,我希望>>操作符在从输入流成功提取所请求的字符数后完​​成读取字符串。 I don't see any immediate reason for it to try to extract the seventh character, which means that I don't expect the stream to enter eof state. 我没有看到它立即尝试提取第七个字符的原因,这意味着我不希望流进入eof状态。

When I run this example under MSVC++, it works as I expect it to: the stream remains in good state after reading. 当我在MSVC ++下运行这个例子时,它按照我的预期工作:读取后流保持良好状态。 However, in GCC the behavior is different: the stream ends up in eof state. 但是,在GCC中,行为是不同的:流最终处于eof状态。

The language standard, it gives the following list of completion conditions for this version of >> operator 语言标准,它为此版本的>>运算符提供以下完成条件列表

  • n characters are stored; 存储n个字符;
  • end-of-file occurs on the input sequence; 文件结束发生在输入序列上;
  • isspace(c,is.getloc()) is true for the next available input character c. 对于下一个可用的输入字符c,isspace(c,is.getloc())为true。

Given the above, I don't see any reason for the >> operator to drive the stream into the eof state in the above code. 鉴于上述情况,我没有看到>>运算符在上述代码中将流驱动到eof状态的任何原因。

However, this is what the >> operator implementation in GCC library looks like 但是,这就是GCC库中>>运算符实现

...
__int_type __c = __in.rdbuf()->sgetc();

while (__extracted < __n
       && !_Traits::eq_int_type(__c, __eof)
       && !__ct.is(__ctype_base::space,
                   _Traits::to_char_type(__c)))
{
  if (__len == sizeof(__buf) / sizeof(_CharT))
  {
    __str.append(__buf, sizeof(__buf) / sizeof(_CharT));
    __len = 0;
  }
  __buf[__len++] = _Traits::to_char_type(__c);
  ++__extracted;
  __c = __in.rdbuf()->snextc();
}
__str.append(__buf, __len);

if (_Traits::eq_int_type(__c, __eof))
  __err |= __ios_base::eofbit;
__in.width(0);
...

As you can see, at the end of each successful iteration, it attempts to prepare the next __c character for the next iteration, even though the next iteration might never occur. 如您所见,在每次成功迭代结束时,它会尝试为下一次迭代准备下一个__c字符,即使下一次迭代可能永远不会发生。 And after the cycle it analyzes the last value of that __c character and sets the eofbit accordingly. 在循环之后,它分析该__c字符的最后一个值并相应地设置eofbit

So, my question is: triggering the eof stream state in the above situation, as GCC does - is it legal from the standard point of view? 所以,我的问题是:在上述情况下触发eof流状态,就像GCC那样 - 从标准的角度来看它是合法的吗? I don't see it explicitly specified in the document. 我没有在文档中明确指出它。 Is both MSVC's and GCC's behavior compliant? MSVC和GCC的行为是否合规? Or is only one of them behaving correctly? 或者只是其中一个表现正常?

The definition for that particular operator>> is not relevant to the setting of the eofbit , as it only describes when the operation terminates, but not what triggers a particular bit. 该特定operator>>的定义与eofbit的设置无关,因为它仅描述操作何时终止,而不描述触发特定位的内容。

The description for the eofbit in the standard (draft) says: 标准(草案)中eofbit的描述说:

eofbit - indicates that an input operation reached the end of an input sequence; eofbit - 表示输入操作到达输入序列的末尾;

I guess here it depends on how you want to interpret "reached". 我想在这里取决于你想要解释“达到”的方式。 Note that gcc implementation correctly does not set failbit , which is defined as 请注意,gcc实现正确无法设置failbit ,其定义为

failbit - indicates that an input operation failed to read the expected characters, or that an output operation failed to generate the desired characters. failbit - 表示输入操作无法读取预期的字符,或者输出操作无​​法生成所需的字符。

So I think eofbit does not necessarily mean that the end of file impeded the extractions of any new characters, just that the end of file has been "reached". 所以我认为eofbit并不一定意味着文件的结尾阻碍了任何新字符的提取,只是文件的结尾已经“到达”了。

I can't seem to find a more accurate description for "reached", so I guess that would be implementation defined. 我似乎无法找到更准确的“达到”描述,所以我想这将是实现定义。 If this logic is correct, then both MSVC and gcc behaviors are correct. 如果此逻辑正确,则MSVC和gcc行为都是正确的。


EDIT: In particular, it seems that eofbit gets set when sgetc() would return eof . 编辑:特别是,当sgetc()返回eof时,似乎eofbit被设置。 This is described both in the istreambuf_iterator section and in the basic_istream::sentry section. 这在istreambuf_iterator部分和basic_istream::sentry部分中都有描述。 So now the question is: when is the current position of the stream allowed to advance? 所以现在问题是:什么时候流的当前位置允许前进?


FINAL EDIT: It turns out that probably g++ has the correct behavior. 最终编辑:事实证明,g ++可能具有正确的行为。

Every character scan passes through <locale> , in order to allow different character sets, money formats, time descriptions and number formats to be parsed. 每个字符扫描都通过<locale> ,以允许解析不同的字符集,货币格式,时间描述和数字格式。 While there does not seem to be a through description on how the operator>> works for strings, there are very specific descriptions on how do_get functions for numbers, time and money are supposed to operate. 虽然似乎没有关于operator>>如何为字符串工作的直通描述,但是有关于数字,时间和金钱的do_get函数应如何do_get非常具体的描述。 You can find them from page 687 of the draft forward. 您可以从草案的第687页找到它们。

All of these start off by reading a ctype (the "global" version of a character, as read through locales) from a istreambuf_iterator (for numbers, you can find the call definitions at page 1018 of the draft). 所有这些都是从istreambuf_iterator读取ctype (字符的“全局”版本,通过locales读取)开始的(对于数字,您可以在草稿的第1018页找到调用定义)。 Then the ctype is processed, and finally the iterator is advanced. 然后处理ctype,最后迭代器被提前。

So, in general, this requires the internal iterator to always point to the next character after the last one read; 因此,一般来说,这需要内部迭代器始终指向最后一个读取后的下一个字符; if that was not the case you could in theory extract more than you wanted: 如果不是这样的话你理论上可以提取超出你想要的东西:

string str = "strin1";
istringstream is(str);
is >> setw(6) >> str;
int x;
is >> x;

If the current character for is after the extraction for str was not on the eof , then the standard would require that x gets the value 1, since for numeric extraction the standard explicitly requires that the iterator is advanced after the first read. 如果当前字符is后提取str并不在eof ,那么标准就需要x得到值1,因为对于数字提取标准明确要求迭代器第一次读取后前进。

Since this does not make much sense, and given that all complex extractions described in the standard behave in the same way, it makes sense that for strings the same would happen. 由于这没有多大意义,并且鉴于标准中描述的所有复杂提取都以相同的方式运行,因此对于字符串来说同样会发生这种情况是有意义的。 Thus, as the pointer for is after reading 6 characters falls on the eof , the eofbit needs to be set. 因此,当指针为is读取6个字符之后落在eof时, eofbit需要被设置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM