简体   繁体   English

sscanf() 如何拆分 C 中的字符串

[英]How sscanf() splits a string in C

So basically I have a simple string and i am trying to use sscanf() to split this string and store the values in appropriate variables:所以基本上我有一个简单的字符串,我正在尝试使用sscanf()来拆分这个字符串并将值存储在适当的变量中:

#include<stdio.h>
int main()
{

    int i=10,j;
    char ch = 'A',ch2;
    float a=3.14,b;
    char str[20]="10A3.140000";


    sscanf(str,"%d%c%f",&j,&ch2,&b);
    printf("%d %c %f\n",j,ch2,b);
    return 0;
}

Now my doubt is how exactly sscanf knows where to split the string.现在我的疑问是sscanf究竟如何知道在哪里分割字符串。 Here the value 10 goes in variable j .这里的值10进入变量j Character A goes in variable ch2 and 3.140000 goes in variable b .字符A进入变量ch23.140000进入变量b How exactly sscanf is parsing this string and storing values in different variables. sscanf 究竟如何解析这个字符串并将值存储在不同的变量中。 It is appreciable if you can explain how exactly sscanf works with any string.如果您能解释sscanf如何准确地处理任何字符串,那将是非常重要的。 I am having a hard time understanding it.我很难理解它。

scanf consumes the input string according to the format specifiers you defined in the format string. scanf根据您在格式字符串中定义的格式说明使用输入字符串。

Let's analyze your format string:让我们分析您的格式字符串:

sscanf(str,"%d%c%f",&j,&ch2,&b);
  • %d parses the string searching for decimal integers . %d解析搜索十进制整数的字符串。 It means that only numeric characters in the range 0-9 are accepted.这意味着只接受0-9范围内的数字字符。 Furthermore a leading sign character is accepted (like in number -123 ).此外,接受前导符号字符(如数字-123 )。 All characters meeting this criteria are consumed满足此条件的所有角色都被消耗
  • %c scans any character, saving its ASCII value in destination variable. %c扫描任何字符,将其 ASCII 值保存在目标变量中。 A single character is consumed消耗单个字符
  • %f parses the string searching for floats . %f解析搜索floats的字符串。 Like the %d case all decimal numbers and sign characters are accepted, but it also accepts the .%d情况一样,所有十进制数字和符号字符都被接受,但它也接受. character interpreting it as the floating point将其解释为浮点的字符

So, let's look again at you input string:所以,让我们再看看你输入的字符串:

char str[20]="10A3.140000";
  1. scanf starts parsing from the first char. scanf从第一个字符开始解析。 Since there are decimal numbers it consumes them until the first non-numeric character is found ( 'A' ).由于存在十进制数字,因此它会消耗它们,直到找到第一个非数字字符( 'A' )。 The integer 10 is stored in j . integer 10存储在j中。

    If the string was finished here, scanf would have returned 1.如果字符串在这里完成, scanf将返回 1。

  2. scanf continues parsing from the letter 'A' . scanf从字母'A'继续解析。 Since we have %c format specifier this character is consumed and the value 65 (ASCII for 'A' ) is stored in ch2 .因为我们有%c格式说明符,所以这个字符被消耗掉了,值 65( 'A' ASCII 码)存储在ch2中。

    If the string was finished here, scanf would have returned 2.如果字符串在这里完成, scanf将返回 2。

  3. After the letter 'A' is consumed scanf continues parsing from character '3' .使用字母'A'后, scanf继续从字符'3'解析。 The %f format specifier will accept all the remaining characters (because they are all valid), and the value 3.14 is stored within variable b . %f格式说明符将接受所有剩余的字符(因为它们都是有效的),并且值3.14存储在变量b中。

    scanf now returns 3. scanf现在返回 3。


I tried to empathize the value returned by scanf because it is checking it that you can have some control on what's going on.我试图理解scanf返回的值,因为它正在检查您是否可以控制正在发生的事情。 In your case在你的情况下

if( sscanf(str,"%d%c%f",&j,&ch2,&b) == 3 )
{
    printf("%d %c %f\n",j,ch2,b);
}

would prevent those scenarios in which a malformed string would have lead to undefined behavior.将防止那些格式错误的字符串会导致未定义行为的情况。 To understand this scenario, consider the following string:要理解这种情况,请考虑以下字符串:

char str[20]="10AB3.140000";

we have two non numeric characters after the first integer.我们在第一个 integer 之后有两个非数字字符。 Then:然后:

  • %d would still consume the value 10 %d仍然会消耗值10
  • %c would still consume the letter 'A' %c仍然会使用字母'A'
  • but %f would not consume the following string, because the substring "B3.140000" can't be parsed as a float : that leading 'B' forces scanf to interrupt parsing and the value 2 would be returned.%f不会使用以下字符串,因为 substring "B3.140000"无法解析为浮点数:前导'B'强制scanf中断解析并返回值 2。

So, without a check on the return value, the uninitialized variable b would be accessed.因此,如果不检查返回值,将访问未初始化的变量b

This explanation is little over-simplified and there is more than this going on inside the sscanf as the source code is itself more than 3,000 lines of code.这个解释并没有被过度简化,并且在sscanf内部还有更多内容,因为源代码本身就有超过 3,000 行代码。

The format specifier mentioned by you in the example is listed below with their corresponding match pattern您在示例中提到的格式说明符及其对应的匹配模式如下所列
%d : [0-9] , [-,+] %d : [0-9] , [-,+]
%f : [0-9] , [-,+,.] %f : [0-9] , [-,+,.]
%c : [az] , [AZ] , Other ASCII Characters %c : [az] , [AZ] , 其他 ASCII 字符

Parsing解析

字符串解析

  1. First, for %d , it will first search for [+,-] signs, and if not found then it will look for [0-9] single digits in sequence.首先,对于%d ,它会首先搜索[+,-]符号,如果没有找到,它将按顺序查找[0-9]个单个数字。 As soon as it sees a non-single digit (A = 65) in the string, it will end its search for %d and save the value of the int to the corresponding address passed to sscanf .一旦它在字符串中看到一个非单个数字(A = 65),它将结束对%d的搜索并将int的值保存到传递给sscanf的相应地址。
  2. Second, for %c , now it will take the next format specifier passed ie %c and begin again from the location where it previously stopped.其次,对于%c ,现在它将采用传递的下一个格式说明符,即%c并从之前停止的位置重新开始。 This time it just has to read a single byte from the string and save it to the corresponding address passed to sscanf这次它只需要从字符串中读取一个字节并将其保存到传递给sscanf的相应地址
  3. Third, for %f , it will first search for [+,-] signs, and if not found it will then look for .第三,对于%f ,它会首先搜索[+,-]符号,如果没有找到它会再寻找. and the single digits [0-9] .和个位数[0-9] This continues until it gets something which is not part of the %f , which in this case will be \0 .这种情况一直持续到它得到不属于%f的东西,在这种情况下它将是\0 And save it to the corresponding address passed to sscanf并将其保存到传递给sscanf的相应地址

Finally, once everything is completed and the string is split, it returns the total number of arguments parsed.最后,一旦一切都完成并且字符串被拆分,它返回 arguments 解析的总数。 So, it a good programming practice to compare the number of the passed argument list with the returned value.因此,将传递的参数列表的数量与返回的值进行比较是一种很好的编程习惯。

Reference:参考:

  1. sscanf source code sscanf源代码
  2. vfscanf source code vfscanf源代码

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM