简体   繁体   English

如何防止scanf导致C中的缓冲区溢出?

[英]How to prevent scanf causing a buffer overflow in C?

I use this code:我使用这个代码:

while ( scanf("%s", buf) == 1 ){

What would be the best way to prevent possible buffer overflow so that it can be passed strings of random lengths?防止可能的缓冲区溢出以便可以传递随机长度的字符串的最佳方法是什么?

I know I can limit the input string by calling for example:我知道我可以通过调用来限制输入字符串,例如:

while ( scanf("%20s", buf) == 1 ){

But I'd prefer to be able to process whatever the user inputs.但我更希望能够处理用户输入的任何内容。 Or can't this be done safely using scanf and I should use fgets?或者这不能使用 scanf 安全地完成,我应该使用 fgets?

In their book The Practice of Programming (which is well worth reading), Kernighan and Pike discuss this problem, and they solve it by using snprintf() to create the string with the correct buffer size for passing to the scanf() family of functions. Kernighan 和 Pike 在他们的书The Practice of Programming (非常值得一读)中讨论了这个问题,他们通过使用snprintf()创建具有正确缓冲区大小的字符串以传递给scanf()系列函数来解决这个问题. In effect:有效:

int scanner(const char *data, char *buffer, size_t buflen)
{
    char format[32];
    if (buflen == 0)
        return 0;
    snprintf(format, sizeof(format), "%%%ds", (int)(buflen-1));
    return sscanf(data, format, buffer);
}

Note, this still limits the input to the size provided as 'buffer'.请注意,这仍然将输入限制为作为“缓冲区”提供的大小。 If you need more space, then you have to do memory allocation, or use a non-standard library function that does the memory allocation for you.如果您需要更多空间,那么您必须进行内存分配,或者使用为您进行内存分配的非标准库函数。


Note that the POSIX 2008 (2013) version of the scanf() family of functions supports a format modifier m (an assignment-allocation character) for string inputs ( %s , %c , %[ ).请注意, POSIX 2008 (2013) 版本的scanf()系列函数支持用于字符串输入( %s%c%[ )的格式修饰符m (赋值分配字符)。 Instead of taking a char * argument, it takes a char ** argument, and it allocates the necessary space for the value it reads:它不采用char *参数,而是采用char **参数,并为它读取的值分配必要的空间:

char *buffer = 0;
if (sscanf(data, "%ms", &buffer) == 1)
{
    printf("String is: <<%s>>\n", buffer);
    free(buffer);
}

If the sscanf() function fails to satisfy all the conversion specifications, then all the memory it allocated for %ms -like conversions is freed before the function returns.如果sscanf()函数无法满足所有转换规范,则在函数返回之前释放它为%ms类的转换分配的所有内存。

If you are using gcc, you can use the GNU-extension a specifier to have scanf() allocate memory for you to hold the input:如果您使用的是 gcc,您可以使用 GNU-extension a说明符让 scanf() 为您分配内存以保存输入:

int main()
{
  char *str = NULL;

  scanf ("%as", &str);
  if (str) {
      printf("\"%s\"\n", str);
      free(str);
  }
  return 0;
}

Edit: As Jonathan pointed out, you should consult the scanf man pages as the specifier might be different ( %m ) and you might need to enable certain defines when compiling.编辑:正如乔纳森指出的那样,您应该查阅scanf手册页,因为说明符可能不同( %m )并且您可能需要在编译时启用某些定义。

Most of the time a combination of fgets and sscanf does the job.大多数时候fgetssscanf的组合可以完成这项工作。 The other thing would be to write your own parser, if the input is well formatted.另一件事是编写自己的解析器,如果输入格式正确。 Also note your second example needs a bit of modification to be used safely:另请注意,您的第二个示例需要进行一些修改才能安全使用:

#define LENGTH          42
#define str(x)          # x
#define xstr(x)         str(x)

/* ... */ 
int nc = scanf("%"xstr(LENGTH)"[^\n]%*[^\n]", array); 

The above discards the input stream upto but not including the newline ( \\n ) character.以上丢弃了输入流,但不包括换行符 ( \\n ) 字符。 You will need to add a getchar() to consume this.您需要添加一个getchar()来使用它。 Also do check if you reached the end-of-stream:还要检查您是否到达了流末尾:

if (!feof(stdin)) { ...

and that's about it.就是这样。

Directly using scanf(3) and its variants poses a number of problems.直接使用scanf(3)及其变体会带来许多问题。 Typically, users and non-interactive use cases are defined in terms of lines of input.通常,用户和非交互式用例是根据输入行定义的。 It's rare to see a case where, if enough objects are not found, more lines will solve the problem, yet that's the default mode for scanf.如果没有找到足够多的对象,则很少出现这样的情况,更多的行将解决问题,但这是 scanf 的默认模式。 (If a user didn't know to enter a number on the first line, a second and third line will probably not help.) (如果用户不知道在第一行输入数字,则第二行和第三行可能无济于事。)

At least if you fgets(3) you know how many input lines your program will need, and you won't have any buffer overflows...至少如果你fgets(3)你知道你的程序需要多少输入行,并且你不会有任何缓冲区溢出......

It's not that much work to make a function that's allocating the needed memory for your string.制作一个为字符串分配所需内存的函数并没有那么多工作。 That's a little c-function i wrote some time ago, i always use it to read in strings.这是我前段时间写的一个小c函数,我总是用它来读取字符串。

It will return the read string or if a memory error occurs NULL.它将返回读取的字符串,或者如果发生内存错误则返回 NULL。 But be aware that you have to free() your string and always check for it's return value.但请注意,您必须 free() 您的字符串并始终检查它的返回值。

#define BUFFER 32

char *readString()
{
    char *str = malloc(sizeof(char) * BUFFER), *err;
    int pos;
    for(pos = 0; str != NULL && (str[pos] = getchar()) != '\n'; pos++)
    {
        if(pos % BUFFER == BUFFER - 1)
        {
            if((err = realloc(str, sizeof(char) * (BUFFER + pos + 1))) == NULL)
                free(str);
            str = err;
        }
    }
    if(str != NULL)
        str[pos] = '\0';
    return str;
}

Limiting the length of the input is definitely easier.限制输入的长度肯定更容易。 You could accept an arbitrarily-long input by using a loop, reading in a bit at a time, re-allocating space for the string as necessary...您可以通过使用循环来接受任意长度的输入,一次读取一点,根据需要为字符串重新分配空间......

But that's a lot of work, so most C programmers just chop off the input at some arbitrary length.但这需要大量工作,因此大多数 C 程序员只是以任意长度截断输入。 I suppose you know this already, but using fgets() isn't going to allow you to accept arbitrary amounts of text - you're still going to need to set a limit.我想您已经知道这一点,但是使用 fgets() 不会允许您接受任意数量的文本 - 您仍然需要设置限制。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM