简体繁体 English

printf和不安全的格式化字符串

[英]printf and unsafe formatting strings

原文 2012-02-08 14:54:42 9 2 c/ printf/ string-formatting/ user-input

The application in question allows users to define their own messages (mainly for customization and/or localization purposes) in plain-text configuration file, which are passed to printf-style functions at runtime. 有问题的应用程序允许用户在纯文本配置文件中定义自己的消息（主要用于定制和/或本地化目的），这些消息在运行时传递给printf样式的函数。 If the user-defined formatting string is faulty, a whole lot of bad things can happen. 如果用户定义的格式化字符串有问题，可能会发生很多不好的事情。

What is the best way to sanitize such user-inputted formatting strings? 清理这些用户输入的格式化字符串的最佳方法是什么？ Or should I drop this approach entirely and use another method to let users safely customize the messages? 或者我应该完全放弃这种方法并使用另一种方法让用户安全地自定义消息？

Solution must be somehow portable (Windows, Linux, BSD, x86, x86-64). 解决方案必须以某种方式移植（Windows，Linux，BSD，x86，x86-64）。

2 个解决方案

定义您自己的格式化语言，您的代码将转换为有效的格式字符串，从而限制用户可以遇到的麻烦（例如，根本不允许％，并定义您自己的符号/标记用于指示％应该出现在输出中）。

You have two choices: 你有两个选择：

Let the user's mess-ups (intentional or not) mess up only themselves, ie don't let the users' personal configurations interfere with each other 让用户的混乱（有意或无意）只弄乱自己，即不要让用户的个人配置互相干扰
Don't let users customize the results. 不要让用户自定义结果。 Or if you do, make the customization so limited that there is nothing they can do that is harmful. 或者，如果你这样做，那么使定制变得如此有限，以至于他们无法做任何有害的事情。
For example, I've frequently done things where users are allowed to provide their own input to things like printf() , but the filters only allowed for things with a certain (very limited) character set. 例如，我经常做一些事情，允许用户提供自己的输入，如printf() ，但过滤器只允许具有特定（非常有限）字符集的东西。 Eg, I'll use a regexp of something like ^[a-zA-Z0-9_]+$ and don't let anything else in. 例如，我将使用^[a-zA-Z0-9_]+$类的正则表达式，不要让任何其他内容。