简体   繁体   English

XSS - 哪些 HTML 标签和属性可以触发 Javascript 事件?

[英]XSS - Which HTML Tags and Attributes can trigger Javascript Events?

I'm trying to code a secure and lightweight white-list based HTML purifier which will use DOMDocument.我正在尝试编写一个安全且轻量级的基于白名单的 HTML 净化器,它将使用 DOMDocument。 In order to avoid unnecessary complexity I am willing to make the following compromises:为了避免不必要的复杂性,我愿意做出以下妥协:

  • HTML comments are removed HTML 注释被删除
  • script and style tags are stripped all together scriptstyle标签一起被剥离
  • only the child nodes of the body tag will be returned只会返回body标签的子节点
  • all HTML attributes that can trigger Javascript events will either be validated or removed所有可以触发 Javascript 事件的 HTML 属性都将被验证或删除

I've been reading a lot about on XSS attacks and prevention and I hope I'm not being too naive (if I am, please let me know!) in assuming that if I follow all the rules I mentioned above, I will be safe from XSS.我已经阅读了很多关于 XSS 攻击和预防的文章,我希望我不会太天真(如果我是,请告诉我!)假设如果我遵循我上面提到的所有规则,我会免受 XSS 攻击。

The problem is I am not sure what other tags and attributes (in any [X]HTML version and/or browser versions/implementations) can trigger Javascript events, besides the default Javascript event attributes :问题是我不确定除了默认的 Javascript 事件属性之外还有哪些其他标签和属性(在任何 [X]HTML 版本和/或浏览器版本/实现中)可以触发 Javascript 事件:

  • onAbort
  • onBlur
  • onChange
  • onClick
  • onDblClick
  • onDragDrop
  • onError
  • onFocus
  • onKeyDown
  • onKeyPress
  • onKeyUp
  • onLoad
  • onMouseDown
  • onMouseMove
  • onMouseOut
  • onMouseOver
  • onMouseUp
  • onMove
  • onReset
  • onResize
  • onSelect
  • onSubmit
  • onUnload

Are there any other non-default or proprietary event attributes that can trigger Javascript (or VBScript, etc...) events or code execution?是否有任何其他非默认或专有事件属性可以触发 Javascript(或 VBScript 等)事件或代码执行? I can think of href , style and action , for instance:我可以想到hrefstyleaction ,例如:

<a href="javascript:alert(document.location);">XSS</a> // or
<b style="width: expression(alert(document.location));">XSS</b> // or
<form action="javascript:alert(document.location);"><input type="submit" /></form>

I will probably just remove any style attributes in the HTML tags, the action and href attributes pose a bigger challenge but I think the following code is enough to make sure their value is either a relative or absolute URL and not some nasty Javascript code:我可能会删除 HTML 标签中的任何style属性, actionhref属性带来更大的挑战,但我认为以下代码足以确保它们的值是相对或绝对 URL,而不是一些讨厌的 Javascript 代码:

$value = $attribute->value;

if ((strpos($value, ':') !== false) && (preg_match('~^(?:(?:s?f|ht)tps?|mailto):~i', $value) == 0))
{
    $node->removeAttributeNode($attribute);
}

So, my two obvious questions are:所以,我的两个明显问题是:

  1. Am I missing any tags or attributes that can trigger events?我是否缺少任何可以触发事件的标签或属性?
  2. Is there any attack vector that is not covered by these rules?是否存在这些规则未涵盖的任何攻击向量?

After a lot of testing, pondering and researching I've come up with the following (rather simple) implementation which, appears to be immune to any XSS attack vector I could throw at it.经过大量测试、思考和研究,我提出了以下(相当简单的)实现,它似乎不受我可以抛出的任何 XSS 攻击向量的影响。

I highly appreciate all your valuable answers, thanks.我非常感谢您的所有宝贵答案,谢谢。

You mention href and action as places javascript: URLs can appear, but you're missing the src attribute among a bunch of other URL loading attributes.你提到hrefaction作为地方javascript: URLs 可以出现,但你在一堆其他 URL 加载属性中缺少src属性。

Line 399 of the OWASP Java HTMLPolicyBuilder is the definition of URL attributes in a white-listing HTML sanitizer. OWASP Java HTMLPolicyBuilder 的第 399 行是白名单 HTML 清理程序中 URL 属性的定义。

 private static final Set<String> URL_ATTRIBUTE_NAMES = ImmutableSet.of( "action", "archive", "background", "cite", "classid", "codebase", "data", "dsync", "formaction", "href", "icon", "longdesc", "manifest", "poster", "profile", "src", "usemap");

The HTML5 Index contains a summary of attribute types. HTML5 索引包含属性类型的摘要。 It doesn't mention some conditional things like <input type=URL value=...> but if you scan that list for valid URL and friends, you should get a decent idea of what HTML5 adds.它没有提到一些有条件的东西,比如<input type=URL value=...>但是如果你扫描那个列表来寻找有效的 URL和朋友,你应该对 HTML5 添加的内容有一个很好的了解。 The set of HTML 4 attributes with type %URI is also informative.类型为%URIHTML 4 属性集也提供了信息。

Your protocol whitelist looks very similar to the OWASP sanitizer one.您的协议白名单看起来与OWASP sanitizer非常相似。 The addition of ftp and sftp looks innocuous enough.添加ftpsftp看起来足够无害。

A good source of security related schema info for HTML element and attributes is the Caja JSON whitelists which are used by the Caja JS HTML sanitizer . Caja JS HTML sanitizer使用的Caja JSON 白名单是 HTML 元素和属性的安全相关架构信息的一个很好的来源。

How are you planning on rendering the resulting DOM?你打算如何渲染生成的 DOM? If you're not careful, then even if you strip out all the <script> elements, an attacker might get a buggy renderer to produce content that a browser interprets as containing a <script> element.如果您不小心,那么即使您删除了所有<script>元素,攻击者也可能会使用有问题的渲染器来生成浏览器解释为包含<script>元素的内容。 Consider the valid HTML that does not contain a script element.考虑不包含脚本元素的有效 HTML。

<textarea><&#47;textarea><script>alert(1337)</script></textarea>

A buggy renderer might output the contents of this as:有问题的渲染器可能会输出以下内容:

<textarea></textarea><script>alert(1337)</script></textarea>

which does contain a script element.它确实包含一个脚本元素。

(Full disclosure: I wrote chunks of both HTML sanitizers mentioned above.) (完全披露:我写了上面提到的两种 HTML 消毒剂的大块。)

Garuda has already given what I would deem as the "correct" answer, and his links are very useful, but he beat me to the punch! Garuda 已经给出了我认为“正确”的答案,而且他的链接非常有用,但他一拳把我打败了!

I give my answer only to reinforce.我给出我的答案只是为了加强。

In this day and age of increasing features in the html and ecmascript specs, avoiding script injection and other such vulnerabilities in html becomes more and more difficult.在这个 html 和 ecmascript 规范中增加功能的时代,避免 html 中的脚本注入和其他此类漏洞变得越来越困难。 With each new addition, a whole world of possible injections is introduced.随着每一个新的添加,一个完整的可能注入的世界被引入。 This is coupled with the fact that different browsers probably have different ideas of how they are going to implement these specs, so you get even more possible vulnerabilities.再加上不同的浏览器可能对如何实现这些规范有不同的想法,因此您可能会遇到更多可能的漏洞。

Take a look at a short list of vectors introduced by html 5看一看html 5引入的向量的简短列表

The best solution is choose what you will allow rather than what you will deny.最好的解决办法是选择你会允许的,而不是你会拒绝的。 It is much easier to say "These tags and these attributes for those given tags alone are allowed. Everything else will sanitized accordingly or thrown out."更容易说“这些标签和这些给定标签的这些属性是允许的。其他所有东西都将相应地清理或丢弃。”

It would be very irresponsible for me to compile a list and say "okay, here you go: here's a list of all of the injection vectors you missed. You can sleep easy."对我来说,编制一个列表并说“好吧,你去吧:这是你遗漏的所有注入向量的列表。你可以睡得很安稳”,这对我来说是非常不负责任的。 In fact, there are probably many injection vectors that are not even known by black hats or white hats.事实上,可能有很多注入向量甚至不为黑帽或白帽所知。 As the ha.ckers website states, script injection is really only limited by the mind.正如 ha.ckers 网站所说,脚本注入实际上只受头脑的限制。

I'd like to answer your specific question at least a little bit, so here are some glaring omissions from your blacklist:我想至少回答你的具体问题,所以这里是你的黑名单中一些明显的遗漏:

  • img src attribute. img src属性。 I think it is important to note that src is a valid attribute on other elements and could be potentially harmful.我认为重要的是要注意src是其他元素的有效属性,可能是有害的。 img also dynsrc and lowsrc , maybe even more. img还有dynsrclowsrc ,甚至更多。
  • type and language attributes typelanguage属性
  • CDATA in addition to just html comments.除了 html 注释之外的CDATA
  • Improperly sanitized input values.未正确清理输入值。 This may not be a problem depending upon how strict your html parsing is.这可能不是问题,具体取决于您的 html 解析有多严格。
  • Any ambiguous special characters.任何不明确的特殊字符。 In my opinion, even unambiguous ones should probably be encoded.在我看来,即使是明确的也应该被编码。
  • Missing or incorrect quotes on attributes (such as grave quotes).属性上的引号丢失或不正确(例如严重引号)。
  • Premature closing of textarea tags.过早关闭 textarea 标签。
  • UTF-8 (and 7) encoded characters in scripts脚本中的 UTF-8(和 7)编码字符
  • Even though you will only return child nodes of the body tag, many browsers will still evaluate head , and html elements inside of body , and most head -only elements inside of body anyway, so this probably won't help much.即使你只会返回body标签的子节点,许多浏览器仍然会评估head ,和html里面的元素body ,最head -只有元素的内部, body无论如何,因此这可能帮助不大。
  • In addition to css expressions, background image expressions除了css表达式,背景图片表达式
  • frame s and iframe s frame s 和iframe s
  • embed and probably object and applet embed并且可能是objectapplet
  • Server side includes服务器端包括
  • PHP tags PHP标签
  • Any other injections (SQL Injection, executable injection, etc.)任何其他注入(SQL 注入、可执行注入等)

By the way, I'm sure this doesn't matter, but camelCased attributes are invalid xhtml and should be lower cased.顺便说一句,我确定这无关紧要,但是camelCased 属性是无效的xhtml 并且应该是小写的。 I'm sure this doesn't affect you.我确定这不会影响你。

You might want to check these 2 links out for additional reference:您可能需要查看这 2 个链接以获取更多参考:

http://adamcecc.blogspot.com/2011/01/javascript.html (this is only applicable when you're 'filtered' input is ever going to find itself between script tags on a page) http://adamcecc.blogspot.com/2011/01/javascript.html (这仅适用于“过滤”输入会在页面上的脚本标签之间找到自己的情况)

http://ha.ckers.org/xss.html (which has a lot of browser-specific event triggers listed) http://ha.ckers.org/xss.html (其中列出了许多特定于浏览器的事件触发器)

I've used HTML Purifier, as you are doing, for this reason too in combination with a wysiwyg-editor.出于这个原因,我也使用了 HTML Purifier,正如您所做的那样,它也结合了所见即所得的编辑器。 What i did different is using a very strict whitelist with a couple of basic markup tags and attributes available and expanding it when the need arose.我所做的不同之处是使用了一个非常严格的白名单,其中包含一些可用的基本标记和属性,并在需要时扩展它。 This keeps you from getting attacked by very obscure vectors (like the first link above) and you can dig in on the newly needed tag/attribute one by one.这可以防止您受到非常模糊的向量(如上面的第一个链接)的攻击,并且您可以逐个挖掘新需要的标签/属性。

Just my 2 cents..只有我的 2 美分..

Don't forget the HTML5 JavaScript event handlers不要忘记 HTML5 JavaScript 事件处理程序

http://www.w3schools.com/html5/html5_ref_eventattributes.asphttp://www.w3schools.com/html5/html5_ref_eventattributes.asp

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM