简体   繁体   English

在论坛中逃避输入的正确/最安全的方法是什么?

[英]What is the correct/safest way to escape input in a forum?

I am creating a forum software using php and mysql backend, and want to know what is the most secure way to escape user input for forum posts. 我正在使用php和mysql后端创建一个论坛软件,并想知道什么是最安全的方式来逃避论坛帖子的用户输入。

I know about htmlentities() and strip_tags() and htmlspecialchars() and mysql_real_escape_string(), and even javascript's escape() but I don't know which to use and where. 我知道htmlentities()和strip_tags()和htmlspecialchars()和mysql_real_escape_string(),甚至javascript的escape(),但我不知道使用哪个和哪里。

What would be the safest way to process these three different types of input (by process, I mean get, save in a database, and display): 处理这三种不同类型输入的最安全的方法是什么(通过进程,我的意思是获取,保存在数据库中,并显示):

  1. A title of a post (which will also be the basis of the URL permalink). 帖子的标题(也将是URL永久链接的基础)。
  2. The content of a forum post limited to basic text input. 论坛帖子的内容仅限于基本文本输入。
  3. The content of a forum post which allows html. 允许html的论坛帖子的内容。

I would appreciate an answer that tells me how many of these escape functions I need to use in combination and why. 我希望得到一个答案,告诉我有多少这些逃逸功能需要组合使用以及为什么。 Thanks! 谢谢!

When generating HTLM output (like you're doing to get data into the form's fields when someone is trying to edit a post, or if you need to re-display the form because the user forgot one field, for instance) , you'd probably use htmlspecialchars() : it will escape < , > , " , ' , and & -- depending on the options you give it. 当生成HTLM输出时(就像你正在尝试编辑帖子时将数据输入到表单的字段中,或者如果你需要重新显示表单,因为用户忘记了一个字段) ,你就会可能使用htmlspecialchars() :它将转义<>"'& - 取决于你给它的选项。

strip_tags will remove tags if user has entered some -- and you generally don't want something the user typed to just disappear ;-) 如果用户输入了一些标签, strip_tags将删除标签 - 而且您通常不希望用户键入的内容只是消失;-)
At least, not for the "content" field :-) 至少,不是“内容”字段:-)


Once you've got what the user did input in the form (ie, when the form has been submitted) , you need to escape it before sending it to the DB. 一旦您获得了用户在表单中输入的内容(即表单已提交) ,您需要在将其发送到数据库之前将其转义。
That's where functions like mysqli_real_escape_string become useful : they escape data for SQL 这就是像mysqli_real_escape_string这样的函数变得有用的地方:它们为SQL转义数据

You might also want to take a look at prepared statements, which might help you a bit ;-) 您可能还想看看准备好的语句,这可能对您有所帮助;-)
with mysqli - and with PDO 与mysqli - 和PDO

You should not use anything like addslashes : the escaping it does doesn't depend on the Database engine ; 你不应该使用像addslashes这样的东西:它的转义不依赖于数据库引擎; it is better/safer to use a function that fits the engine (MySQL, PostGreSQL, ...) you are working with : it'll know precisely what to escape, and how. 使用适合您正在使用的引擎(MySQL,PostGreSQL,...)的函数更好/更安全:它将准确知道要逃脱的内容以及如何逃脱。


Finally, to display the data inside a page : 最后,要在页面内显示数据:

  • for fields that must not contain HTML, you should use htmlspecialchars() : if the user did input HTML tags, those will be displayed as-is, and not injected as HTML. 对于不能包含HTML的字段,您应该使用htmlspecialchars() :如果用户输入HTML标记,那些将按原样显示,而不是作为HTML注入。
  • for fields that can contain HTML... This is a bit trickier : you will probably only want to allow a few tags, and strip_tags (which can do that) is not really up to the task (it will let attributes of the allowed tags) 对于可以包含HTML的字段...这有点棘手:你可能只想要允许一些标签,而strip_tags (可以做到这一点)并不真正取决于任务(它将允许允许的标签的属性) )
    • You might want to take a look at a tool called HTMLPUrifier : it will allow you to specify which tags and attributes should be allowed -- and it generates valid HTML, which is always nice ^^ 您可能想看看一个名为HTMLPUrifier的工具:它将允许您指定应该允许哪些标记和属性 - 并且它生成有效的HTML,这总是很好^^
    • This might take some time to compute, and you probably don't want to re-generate that HTML each time is has to be displayed ; 这可能需要一些时间来计算,并且您可能不希望每次都必须重新生成该HTML; so you can think about storing it in the database (either only keeping that clean HTML, or keeping both it and the not-clean one, in two separate fields -- might be useful to allow people editing their posts ? ) 所以你可以考虑将它存储在数据库中(或者只保留干净的HTML,或者将它和非干净的HTML保存在两个单独的字段中 - 可能有助于人们编辑他们的帖子?)


Those are only a few pointers... hope they help you :-) 这些只是一些指示......希望他们帮助你:-)
Don't hesitate to ask if you have more precise questions ! 不要犹豫,问你是否有更准确的问题!

mysql_real_escape_string() escapes everything you need to put in a mysql database. mysql_real_escape_string()转义放入mysql数据库所需的所有内容。 But you should use prepared statements (in mysqli) instead, because they're cleaner and do any escaping automatically. 但是你应该使用预备语句(在mysqli中),因为它们更干净并且可以自动进行任何转义。

Anything else can be done with htmlspecialchars() to remove HTML from the input and urlencode() to put things in a format for URL's. 还可以使用htmlspecialchars()从输入中删除HTML,并使用urlencode()将内容放入URL的格式中。

There are two completely different types of attack you have to defend against: 您必须防御两种完全不同类型的攻击:

  • SQL injection: input that tries to manipulate your DB. SQL注入:尝试操作数据库的输入。 mysql_real_escape_string() and addslashes() are meant to defend against this. mysql_real_escape_string()addslashes()旨在防御这一点。 The former is better, but parameterized queries are better still 前者更好,但参数化查询仍然更好
  • Cross-Site scripting (XSS): input that, when displayed on your page, tries to execute JavaScript in a visitor's browser to do all kinds of things (like steal the user's account data). 跨站点脚本(XSS):当您在页面上显示时,尝试在访问者的浏览器中执行JavaScript以执行各种操作(例如窃取用户的帐户数据)的输入。 htmlspecialchars() is the definite way to defend against this. htmlspecialchars()是防御这种情况的明确方法。

Allowing "some HTML" while avoiding XSS attacks is very, very hard. 在避免XSS攻击的同时允许“一些HTML”是非常非常困难的。 This is because there are endless possibilities of smuggling JavaScript into HTML. 这是因为将JavaScript走私到HTML中的可能性很大。 If you decided to do this, the safe way is to use BBCode or Markdown, ie a limited set of non-HTML markup that you then convert to HTML, while removing all real HTML with htmlspecialchars() . 如果您决定这样做,安全的方法是使用BBCode或Markdown,即一组有限的非HTML标记,然后转换为HTML,同时使用htmlspecialchars()删除所有真实的HTML。 Even then you have to be careful not to allow javascript: URLs in links. 即便如此,你必须小心不要在链接中允许javascript: URL。 Actually allowing users to input HTML is something you should only do if it's absolutely crucial for your site . 实际上允许用户输入HTML是你应该做的事情,如果它对你的网站绝对至关重要 And then you should spend a lot of time making sure you understand HTML and JavaScript and CSS completely. 然后你应该花很多时间确保你完全理解HTML和JavaScript和CSS。

The answer to this post is a good answer 这篇文章的答案是一个很好的答案

Basically, using the pdo interface to parameterize your queries is much safer and less error prone than escaping your inputs manually. 基本上,使用pdo接口来参数化查询比手动转义输入更安全,更不容易出错。

I have a tendency to escape all characters that would be problematic in page display, Javascript and SQL all at the same time. 我倾向于逃避所有在页面显示,Javascript和SQL同时存在问题的角色。 It leaves it readable on the web and in HTML eMail and at the same time removes any problems with the code. 它使它在Web和HTML电子邮件中可读,同时消除了代码的任何问题。 A vb.NET Line Of Code Would Be: 一个vb.NET代码行将是:

SafeComment = Replace( _
              Replace(Replace(Replace( _
              Replace(Replace(Replace( _
              Replace(Replace(Replace( _
              Replace(Replace(Replace( _
                HttpUtility.HtmlEncode(Trim(strInput)), _
                  ":", "&#x3A;"), "-", "&#x2D;"), "|", "&#x7C;"), _
                  "`", "&#x60;"), "(", "&#x28;"), ")", "&#x29;"), _
                  "%", "&#x25;"), "^", "&#x5E;"), """", "&#x22;"), _
                  "/", "&#x2F;"), "*", "&#x2A;"), "\", "&#x5C;"), _
                  "'", "&#x27;")

First of all, general advice: don't escape variables literally when inserting in the database. 首先,一般建议:在数据库中插入时,不要按字面意思转义变量。 There are plenty of solutions that let you use prepared statements with variable binding. 有许多解决方案可以让您使用带有变量绑定的预准备语句。 The reason to not do this explicitly is because it is only a matter of time then before you forget it just once. 不明确这样做的原因是因为在你忘记它之前只是一个时间问题。

If you're inserting plain text in the database, don't try to clean it on insert, but instead clean it on display. 如果要在数据库中插入纯文本,请不要尝试在插入时清除它,而是在显示时清除它。 That is to say, use htmlentities to encode it as HTML (and pass the correct charset argument). 也就是说,使用htmlentities将其编码为HTML(并传递正确的charset参数)。 You want to encode on display because then you're no longer trusting that the database contents are correct, which isn't necessarily a given. 您希望在显示器上进行编码,因为您不再相信数据库内容是正确的,这不一定是给定的。

If you're dealing with rich text (html), things get more complicated. 如果你正在处理富文本(html),事情会变得更复杂。 Removing the "evil" bits from HTML without destroying the message is a difficult problem. 从HTML中删除“邪恶”位而不破坏消息是一个难题。 Realistically speaking, you'll have to resort to a standardized solution, like HTMLPurifier . 实际上,您必须采用标准化解决方案,如HTMLPurifier However, this is generally too slow to run on every page view, so you'll be forced to do this when writing to the database. 但是,这通常太慢而无法在每个页面视图上运行,因此在写入数据库时​​您将被迫执行此操作。 You'll also have to ensure that the user can see their "cleaned up" html and correct the cleaned up version. 您还必须确保用户可以看到他们的“清理”html并更正已清理的版本。

Definitely try to avoid "rolling your own" filter or encoding solution at any step. 绝对尽量避免在任何步骤“滚动自己的”过滤器或编码解决方案。 These problems are notoriously tricky, and you run a large risk of overlooking some minor detail that has big security implications. 这些问题非常棘手,您可能会忽略一些具有重大安全隐患的细节。

I second Joeri, do not roll your own, go here to see some of the the many possible XSS attacks 我是第二个Joeri,不要自己动手,去这里看一些可能的XSS攻击

http://ha.ckers.org/xss.html http://ha.ckers.org/xss.html

htmlentities() -> turns text into html, converting characters to entities. htmlentities() - >将文本转换为html,将字符转换为实体。 If using UTF-8 encoding then use htmlspecialchars() instead as the other entities are not needed. 如果使用UTF-8编码,则使用htmlspecialchars()代替,因为不需要其他实体。 This is the best defence against XSS. 这是对XSS的最佳防御。 I use it on every variable I output regardless of type or origin unless I intend it to be html. 我在输出的每个变量上使用它,无论类型或原点如何,除非我打算将它作为html。 There is only a tiny performance cost and it is easier than trying to work out what needs escaping and what doesn't. 只有很小的性能成本,它比试图找出需要逃避和不需要的东西更容易。

strip_tags() - turns html into text by removing all html tags. strip_tags() - 通过删除所有html标记将html转换为文本。 Use this to ensure that there is nothing nasty in your input as a adjunct to escaping your output. 使用此选项可确保输入中没有任何令人讨厌的东西作为转义输出的附件。

mysql_real_escape_string() - escapes a string for mysql and is your defence against SQL injections from little Bobby tables (better to use mysqli and prepare/bind as escaping is then done for you and you can avoid lots of messy string concatenations) mysql_real_escape_string() - 为mysql转义一个字符串,可以防止来自小Bobby表的SQL注入(更好地使用mysqli和prepare / bind,因为为你完成了转义,你可以避免大量乱码串联)

The advice given obve re avoiding HTML input unless it is essential and opting for BBCode or similar (make your own up if needs be) is very sound indeed. 给出的建议主要是避免HTML输入,除非它是必要的并且选择BBCode或类似的(如果需要的话,自己动起来)确实非常合理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM