简体   繁体   English

fscanf read()s 超过了我要求的字符数

[英]fscanf read()s more than the number of characters I asked for

I have the following code:我有以下代码:

#include <stdio.h>

int main(void)
{
  unsigned char c;

  setbuf(stdin, NULL);
  scanf("%2hhx", &c);
  printf("%d\n", (int)c);
  return 0;
}

I set stdin to be unbuffered, then ask scanf to read up to 2 hex characters.我将stdin设置为无缓冲,然后让scanf读取最多 2 个十六进制字符。 Indeed, scanf does as asked;确实, scanf确实按照要求进行了。 for example, having compiled the code above as foo :例如,将上面的代码编译为foo

$ echo 23 | ./foo
35

However, if I strace the program, I find that libc actually read 3 characters.但是,如果我对程序进行strace ,我发现 libc 实际上读取了 3 个字符。 Here is a partial log from strace:这是来自 strace 的部分日志:

$ echo 234| strace ./foo
read(0, "2", 1)                         = 1
read(0, "3", 1)                         = 1
read(0, "4", 1)                         = 1
35 # prints the correct result

So sscanf is giving the expected result.所以 sscanf 给出了预期的结果。 However, this extra character being read is detectable, and it happens to break the communications protocol I am trying to implement (in my case, GDB remote debugging).但是,可以检测到这个额外的字符被读取,它恰好破坏了我试图实现的通信协议(在我的例子中,GDB 远程调试)。

The man page for sscanf says about the field width: sscanf 的手册页说明了字段宽度:

Reading of characters stops either when this maximum is reached or when a nonmatching character is found, whichever happens first.当达到此最大值或找到不匹配的字符时,字符的读取将停止,以先发生者为准。

This seems a bit deceptive, at least;至少,这似乎有点欺骗性。 or is it in fact a bug?或者它实际上是一个错误? Is it too much to hope that with unbuffered stdin, scanf might read no more than the amount of input I asked for?希望使用无缓冲的标准输入,scanf 读取的输入量不会超过我要求的输入量,这是否太过分了?

(I'm running on Ubuntu 18.04 with glibc 2.27; I've not tried this on other systems.) (我在 Ubuntu 18.04 和 glibc 2.27 上运行;我没有在其他系统上尝试过。)

This seems a bit deceptive, at least;至少,这似乎有点欺骗性。 or is it in fact a bug?或者它实际上是一个错误?

IMO, no.国际海事组织,没有。

An input item is read from the stream, ... An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence.输入项从 stream 中读取,... 输入项定义为输入字符的最长序列,不超过任何指定的字段宽度,并且是匹配输入序列或匹配输入序列的前缀。 The first character, if any, after the input item remains unread.输入项之后的第一个字符(如果有)保持未读状态。 If the length of the input item is zero, the execution of the directive fails;如果输入项的长度为零,则指令执行失败; this condition is a matching failure unless end-of-file, an encoding error, or a read error prevented input from the stream, in which case it is an input failure.这种情况是匹配失败,除非文件结束、编码错误或读取错误阻止了来自 stream 的输入,在这种情况下是输入失败。 C17dr § 7.21.6.2 9 C17dr § 7.21.6.2 9

Code such as "%hhx" (without a width limit) certainly must get 1 past the hex characters to know it is done.诸如"%hhx"类的代码(没有宽度限制)当然必须超过十六进制字符 1 才能知道它完成了。 That excess character is pushed-back into stdin for the next input operation.多余的字符被推回stdin以进行下一次输入操作。

The "The first character, if any, after the input item remains unread" implies to me then a disassociation of reading characters from the stream at the lowest level and reading characters from the stream as a stream can pushed-back at least 1 character and consider that as "remains unread". “输入项后的第一个字符(如果有)仍未读”对我来说意味着从最低级别的 stream 读取字符和从 stream 读取字符作为ZF7B44CFAFD -5C522212E7BZ 至少可以推回5C522212E7BZ8888将其视为“未读”。 The width limit of 2 does not save code as 3 characters can be read from the stream and 1 pushed back.宽度限制 2 不保存代码,因为可以从stream读取 3 个字符并将 1 推回。

The width of 2 limits the maximum length of bytes to interpret, not a limit of the number of characters read at the lowest level. 2 的宽度限制了要解释的最大字节长度,而不是在最低级别读取的字符数的限制。

Is it too much to hope that with unbuffered stdin, scanf might read no more than the amount of input I asked for?希望使用无缓冲的标准输入,scanf 读取的输入量不会超过我要求的输入量,这是否太过分了?

Yes.是的。 If buffered or not, I think as a stream like stdin allows pushed-back of characters to consider them unread.如果缓冲与否,我认为像stdin这样的stream允许回推字符以认为它们未读。

Anyways, "%2hhx" is brittle to expect not more than 2 characters read given leading white-space do not count.无论如何, "%2hhx"是脆弱的,因为前导空格不计算在内,所以读取的字符数不超过 2 个。 "These white-space characters are not counted against a specified field width." “这些空白字符不计入指定的字段宽度。”


The "I set stdin to be unbuffered" does not stop a stream from reading an excess character and later pushing it back. “我将标准输入设置为无缓冲”并不会阻止stream读取多余的字符并稍后将其推回。


Given "this extra character being read is detectable, and it happens to break the communications protocol" I recommend a new approach that does not use a stream .鉴于“可以检测到正在读取的这个额外字符,并且它恰好破坏了通信协议”,我推荐一种不使用stream的新方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM