简体   繁体   English

它是如何有意义的,为什么sscanf功能仍然有效?

[英]How does it make sense and why is the sscanf function still working?

As you can see, I only allocated 1 byte as sizeof(char) inside the loop, and still sscanf() reads the whole block up until the blank into string_of_letters . 正如你所看到的,我只在循环内部分配了1个字节作为sizeof(char) ,而sscanf()仍然读取整个块,直到空白为string_of_letters How is this possible? 这怎么可能?

What is the definition of sscanf() ? sscanf()的定义是什么?

For example: str = "rony is a man" but at string_of_letters position i I see "rony". 例如: str = "rony is a man" ,但在string_of_letters位置i我看“罗尼”。

char **string_of_letters;
int i;
char *read = str;

string_of_letters = (char**)malloc(3 * sizeof(char*));
for (i = 0; i < 3; i++) {
    string_of_letters[i] = (char*)malloc(sizeof(char));
    sscanf(read,"%[^, ]", &(*string_of_letters[i]));
    printf("%s\n", string_of_letters[i]);
}

C does not impose runtime memory bounds checking, so the fact that you only allocated one byte is of no consequence to the function of sscanf : it will happily attempt to store the entire string to the memory location pointed to by the pointer you provide. C不会强制执行运行时内存边界检查,因此只分配一个字节这一事实对sscanf的功能没有影响:它会愉快地尝试将整个字符串存储到您提供的指针所指向的内存位置。 If the buffer is not big enough though, the result is undefined behavior, the exact consequences of which depend on too many factors to consider (the compiler used and its version, the operating system, the current state of memory, etc.). 如果缓冲区不够大,结果是未定义的行为,其确切后果取决于要考虑的太多因素(使用的编译器及其版本,操作系统,当前的内存状态等)。

In a small toy program such as yours, it is not surprising that it appears to work properly, as the buffers are small enough and there is not much else going on. 在像你这样的小玩具程序中,它似乎工作正常并不奇怪,因为缓冲区足够小并且没有其他的东西在进行。 In a larger program, however, it is likely that sscanf would write over the end of the buffer passed in and into another buffer, allocated for something else, altering memory you did not want to change, or, if you're lucky for example, into protected memory, causing an access violation. 然而,在一个更大的程序中, sscanf可能会在传入缓冲区的末尾写入另一个缓冲区,分配给其他内容,改变你不想改变的内存,或者,如果你很幸运,例如进入受保护的内存,导致访问冲突。

There are many ways to fix up the code fragment shown. 有很多方法可以修复显示的代码片段。 This code shows three of them. 此代码显示其中三个。 As noted in comments to the question, you need to allocate at least 2 characters inside the loop (because %[…] scan sets create a null-terminated string), but then you could use %1[^, ] as the conversion to get one character at a time. 正如在问题的评论中所指出的,你需要在循环中分配至少2个字符(因为%[…]扫描集创建一个以空字符结尾的字符串),但是你可以使用%1[^, ]作为转换一次得到一个角色。 Note that you need to test the return value of sscanf() to check that you got what you expected. 请注意,您需要测试sscanf()的返回值,以检查您是否得到了预期。 You also need to increment read so as not to read the same character over and over. 您还需要增加读取,以便不会反复读取相同的字符。 In more general cases, you'd use %n to be told where the scan stopped (see Using sscanf() in a loop ). 在更一般的情况下,您将使用%n来告知扫描停止的位置(请参阅在循环中使用sscanf() )。 Scan sets do not skip white space (nor do %c or %n — all other standard conversions do skip leading white space, including newlines). 扫描集不会跳过空格( %c%n也不会跳过 - 所有其他标准转换都会跳过前导空格,包括换行符)。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

enum { LIST_SIZE = 3 };

static void free_array(size_t n, char **arr)
{
    for (size_t i = 0; i < n; i++)
        free(arr[i]);
    free(arr);
}

int main(void)
{
    char str[] = "rony is a man";
    char **string_of_letters;
    char *read = str;

    printf("Variant 1:\n");
    string_of_letters = (char **)malloc(LIST_SIZE * sizeof(char *));
    for (int i = 0; i < LIST_SIZE; i++)
    {
        string_of_letters[i] = (char *)malloc(2 * sizeof(char));
        if (sscanf(&read[i], "%1[^, ]", string_of_letters[i]) != 1)
            printf("Conversion failed on %d\n", i);
        else
            printf("%s\n", string_of_letters[i]);
    }

    free_array(LIST_SIZE, string_of_letters);

    printf("Variant 2:\n");
    string_of_letters = (char **)malloc(LIST_SIZE * sizeof(char *));
    for (int i = 0; i < LIST_SIZE; i++)
    {
        string_of_letters[i] = (char *)malloc(sizeof(char));
        *string_of_letters[i] = read[i];
        printf("%c\n", *string_of_letters[i]);
    }

    free_array(LIST_SIZE, string_of_letters);

    printf("Variant 3:\n");
    strcpy(str, "  r o  n");

    char char_list[LIST_SIZE + 1];      // NB: + 1 provides space for null byte
    int offset = 0;
    for (int i = 0; i < LIST_SIZE; i++)
    {
        int pos;
        printf("Offset = %d: ", offset);
        if (sscanf(&read[offset], " %1[^, ]%n", &char_list[i], &pos) != 1)
        {
            printf("Conversion failed on character index %d\n", i);
            break;
        }
        else
            printf("%c\n", char_list[i]);
        offset += pos;
    }

    return 0;
}

The code shown runs cleanly under Valgrind on a Mac running macOS 10.13.6 High Sierra with Valgrind 3.14.0.GIT (a version extracted from Git, rather than a formally released set of source code). 显示的代码在运行macOS 10.13.6 High Sierra的Mac上的Valgrind下运行得很干净,使用Valgrind 3.14.0.GIT(从Git中提取的版本,而不是正式发布的源代码集)。

Output: 输出:

Variant 1:
r
o
n
Variant 2:
r
o
n
Variant 3:
Offset = 0: r
Offset = 3: o
Offset = 5: n

As already observed, the code in the question sorta works, more by accident than design. 正如已经观察到的那样,问题分类中的代码更有效,而不是设计。 The pointer returned by malloc() is constrained so that it points to a memory location that could be used for any purpose: malloc()返回的指针受到约束,因此它指向可用于任何目的的内存位置:

C11 §7.22.3 Memory management functions C11§7.22.3内存管理功能

¶1 … The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (…). ¶1...如果分配成功,则返回的指针被适当地对齐,以便可以将其指定给具有基本对齐要求的任何类型对象的指针,然后用于在分配的空间中访问此类对象或此类对象的数组(......)。 ...

That means that successive allocations of a single char will not be contiguous because of the alignment requirements of other types. 这意味着由于其他类型的对齐要求,单个char连续分配将不是连续的。 Typically, you'll find that the minimum space allocated is 8 or 16 bytes (on 32-bit or 64-bit platforms), but that's by no means required. 通常,您会发现分配的最小空间为8或16个字节(在32位或64位平台上),但这绝不是必需的。 This does mean that there is often more space allocated than you requested (especially if you request a single byte). 这意味着通常会分配比您请求的空间更多的空间(特别是如果您请求单个字节)。 However, access to that extra space leads to undefined behaviour. 但是,访问该额外空间会导致未定义的行为。 Your run of your example code shows that sometimes 'undefined behaviour' behaves more or less as expected. 您运行的示例代码显示有时“未定义的行为”的行为或多或少与预期的一致。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM