简体   繁体   English

对sscanf的使用缺乏了解

[英]lack of understanding about sscanf usage

I would like to parse a specific line. 我想解析一个特定的行。 So, I wrote the following piece of code in order to test the logic but I probably understand something wrongly : 所以,我编写了下面的代码来测试逻辑,但我可能错误地理解了一些:

typedef struct vers
{
   char tu8UVersion[5];
   char tu8UCommit[32];
}tst_prg_versions;

int main(int argc, char **argv)
{
    tst_prg_versions lstVer;
    char buf1[32];
    char buf2[32];

    char str[] = "BOARD-VERS-v1.0.0-git+9abc12345a";
    sscanf(str, "BOARD-VERS-v%5s-git+%s", lstVer.tu8UVersion, lstVer.tu8UCommit);
    printf("vers='%s'\n", lstVer.tu8UVersion);
    printf("commit='%s'\n", lstVer.tu8UCommit);

    sscanf(str, "BOARD-VERS-v%5s-git+%s", buf1, buf2);
    printf("vers='%s'\n", buf1);
    printf("commit='%s'\n", buf2);
    return 0;
}

Once executed it returns : 一旦执行,它将返回:

vers='1.0.09abc12345a'
commit='9abc12345a'
vers='1.0.0'
commit='9abc12345a

Why the first vers is equals to 1.0.09abc12345a and not 1.0.0 ? 为什么第一个vers等于1.0.09abc12345a而不是1.0.0

The first actually reads 1.0.0! 第一个实际读取1.0.0! Problem is, however, that tu8UVersion is not null-terminated, thus printf (not sscanf) prints beyound the field (doing so is undefined behaviour, however, as noted by sjsam) - which is immediately followed by tu8UCommit (does not necessarily have to be so, there could still be some fill bytes in between for alignment reasons!). 然而,问题是tu8UVersion不是以空值终止的,因此printf(不是sscanf)打印在字段上(这样做是未定义的行为,但正如sjsam所述) - 紧接着是tu8UCommit(不一定要因此,出于对齐原因,它们之间仍可能存在一些填充字节!)。

You need to either print 5 characters at most ( %.5s in printf format string) or leave place for terminating the tu8UVersion with 0, as proposed in a comment already. 您需要最多打印5个字符(printf格式字符串中为%.5s )或者留下用于终止tu8UVersion为0的位置,如评论中所建议的那样。

Something similar could have happened with your buffers, too. 你的缓冲区也可能发生类似的事情。 You are lucky that they appearently have been initialized to 0 already (probably because of compiled as debug version), which again does not necessarily have to happen. 你很幸运,他们似乎已经初始化为0(可能是因为编译为调试版本),这不一定必须发生。 So with bad luck, you could have printed the whole rest of buf1 (having been left at garbage) and even beyond. 所以运气不好,你可以打印出剩余的buf1(已被留在垃圾箱中)甚至更远。

Why the first vers is equals to 1.0.09abc12345a and not 1.0.0 ? 为什么第一个vers等于1.0.09abc12345a而不是1.0.0?

Remember that you have 记住你有

typedef struct vers
{
   char tu8UVersion[5];
   char tu8UCommit[32];
}tst_prg_versions;

I guess, there is a good chance the memory for tu8UVersion and tu8UCommit is contiguous. 我猜, tu8UVersiontu8UCommit的内存tu8UVersion tu8UCommit是连续的。 Since you have not null-terminated tu8UVersion when you do : 当你这样做时,你没有以空值终止的 tu8UVersion

printf("vers='%s'\n", lstVer.tu8UVersion);

it goes on to print tu8UCommit and it stops because tu8UCommit is null terminated. 它继续打印tu8UCommit并停止因为tu8UCommit为空终止。

While sscanf seem the most sensible solution here you could also introduce some formatting : 虽然sscanf似乎是最明智的解决方案,你也可以介绍一些格式:

char tu8UVersion[32];
   /*  version number can't get too big.
    *  So the first step is do allocated a
    *  reasonably - but not too - big size for it.
    *  So that you can be sure there are few empty bytes at the end.
    */

and then use a function to sanitize a string : 然后使用函数来清理字符串:

char* sanitized(char* ptr)
{
  if(ptr[strlen(ptr)]!='\0')  // include string.h for strlen
     ptr[strlen(ptr)]='\0';
  return ptr;
}

and print it like : 并打印出来像:

 printf("vers='%s'\n", sanitized(lstVer.tu8UVersion));

Your problem has already been identified in the comments: You don't leave space for the terminating null character and the two strings are run together. 您的问题已在注释中确定:您没有为终止空字符留出空间,并且两个字符串一起运行。

If you want to scan a version whose size you don't know beforehand, you can limit the characters to scan to decimal digits and points with %[.-9] or to everything except a hyphen with %[^-] . 如果要扫描预先不知道其大小的版本,可以将字符限制为使用%[.-9]扫描到十进制数字和点,或者除了带有%[^-]的连字符以外的所有内容。 (The %[...] format is like %s , except that you must provide a list of valid characters in the brackets. A caret as first letter means that the string is made up of characters that are not listed. In other words, %s is short for %[^ \\t\\n] %[...]格式与%s类似,不同之处在于您必须在括号中提供有效字符列表。作为首字母的插入符表示该字符串由未列出的字符组成。换句话说, %s%[^ \\t\\n]缩写

When you scan a string, you should test the return value of sscanf to be sure that all items have been scanned correctly and contain valid values. 扫描字符串时,应测试sscanf的返回值,以确保所有项目都已正确扫描并包含有效值。

Here's a variant that scans version numbers of up to 11 letters: 这是一个扫描最多11个字母的版本号的变体:

#include <stdlib.h>
#include <stdio.h>

typedef struct vers
{
   char tu8UVersion[12];
   char tu8UCommit[32];
} tst_prg_versions;

int main(int argc, char **argv)
{
    tst_prg_versions lstVer;

    char str[] = "BOARD-VERS-v1.0.0-git+9abc12345a";
    int n;

    n = sscanf(str, "BOARD-VERS-v%11[^-]-git+%s",
        lstVer.tu8UVersion, lstVer.tu8UCommit);

    if (n == 2) {
        printf("vers='%s'\n", lstVer.tu8UVersion);
        printf("commit='%s'\n", lstVer.tu8UCommit);
    } else {
        puts("Parse error.");
    }

    return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM