简体   繁体   English

PHP:HTML属性编码/ JavaScript解码

[英]PHP: HTML Attribute Encoding / JavaScript Decoding

What's the proper way to encode untrusted data for HTML attribute context? 对HTML属性上下文编码不受信任数据的正确方法是什么? For example: 例如:

<input type="hidden" value="<?php echo $data; ?>" />

I usually use htmlentities() or htmlspecialchars() to do this: 我通常使用htmlentities()htmlspecialchars()来执行此操作:

<input type="hidden" value="<?php echo htmlentities($data); ?>" />

However, I recently ran into an issue where this was breaking my application when the data I needed to pass was a URL which needed to be handed off to JavaScript to change the page location: 但是,我最近遇到了一个问题,当我需要传递的数据是一个需要传递给JavaScript以更改页面位置的URL时,这会破坏我的应用程序:

<input id="foo" type="hidden" value="foo?bar=1&amp;baz=2" />
<script>
    // ...
    window.location = document.getElementById('foo').value;
    // ...
</script>

In this case, foo is a C program, and it doesn't understand the encoded characters in the URL and segfaults. 在这种情况下, foo是一个C程序,它不理解URL和段错误中的编码字符。

I can simply grab the value in JavaScript and do something like value.replace('&amp;', '&') , but that seems kludgy, and only works for ampersands. 我可以简单地抓住JavaScript中的值并执行类似value.replace('&amp;', '&') ,但这看起来很糟糕,只适用于&符号。

So, my question is: is there a better way to go about the encoding or decoding of data that gets injected into HTML attributes? 所以,我的问题是:是否有更好的方法来对注入HTML属性的数据进行编码或解码?

I have read all of OWASP's XSS Prevention Cheatsheet , and it sounds to me like as long as I'm careful to quote my attributes, then the only character I need to encode is the quote itself ( " ) - in which case, I could use something like str_replace('"', '&quot;', ...) - but, I'm not sure if I'm understanding it properly. 我已经阅读了所有OWASP的XSS预防秘籍表 ,这听起来像我只要我小心引用我的属性,然后我需要编码的唯一字符是引用本身( " ) - 在这种情况下,我可以使用类似str_replace('"', '&quot;', ...) - 但是,我不确定我是否正确理解它。

Your current method of using htmlentities() or htmlspecialchars() is the right approach. 您当前使用htmlentities()htmlspecialchars()的方法是正确的方法。

The example you provided is correct HTML: 您提供的示例是正确的HTML:

<input id="foo" type="hidden" value="foo?bar=1&amp;baz=2" />

The ampersand in the value attribute does indeed need to be HTML encoded, otherwise your HTML is invalid. value属性中的&符确实需要进行HTML编码,否则您的HTML无效。 Most browsers would parse it correctly with an & in there, but that doesn't change the fact that it's invalid and you are correct to be encoding it. 大多数浏览器会使用&在那里正确地解析它,但这并没有改变它无效的事实,你编码它是正确的。

Your problem lies not in the encoding of the value, which is good, but in the fact that you're using Javascript code that doesn't decode it properly. 你的问题不在于值的编码,这很好,但事实上你正在使用不能正确解码它的Javascript代码。

In fact, I'm surprised at this, because your JS code is accessing the DOM, and the DOM should be returning the decoded values. 事实上,我对此感到惊讶,因为你的JS代码正在访问DOM,DOM应该返回解码的值。

I wrote a JSfiddle to prove this to myself: http://jsfiddle.net/qRd4Z/ 我写了一个JSfiddle来证明这一点: http//jsfiddle.net/qRd4Z/

Running this, it gives me an alert box with the decoded value as I expected. 运行它,它给了我一个警告框,其中包含我所期望的解码值。 Changing it to console.log also give the result I expect. 将它更改为console.log也会给出我期望的结果。 So I'm not sure why you're getting different results? 所以我不确定你为什么得到不同的结果? Perhaps you're using a different browser? 也许你正在使用不同的浏览器? It might be worth specifying which one you're testing with. 可能值得指定您正在测试哪一个。 Or perhaps you've double-encoded the entities by mistake? 或许你错误地对实体进行了双重编码? Can you confirm that's not the case? 你能否证实情况并非如此?

What's the proper way to encode untrusted data for HTML attribute context? 对HTML属性上下文编码不受信任数据的正确方法是什么?

If you add double quotes around the attribute value, htmlspecialchars() is enough. 如果在属性值周围添加双引号,htmlspecialchars()就足够了。

  <input id="foo" type="hidden" value="foo?bar=1&amp;baz=2" /> 

This is correct, and the browser will send foo?bar=1&baz=2 (decoded &amp; ) to the server. 这是正确的,浏览器会将foo?bar=1&baz=2 (已解码&amp; )发送到服务器。 If the server isn't seeing foo?bar=1&baz=2 , you must have encoded the value twice. 如果服务器没有看到foo?bar=1&baz=2 ,则必须将值编码两次。

Getting the value in javascript should return foo?bar=1&baz=2 too (eg document.getElementById('foo').value must return foo?bar=1&baz=2 ). 获取javascript中的值应返回foo?bar=1&baz=2 (例如document.getElementById('foo').value必须返回foo?bar=1&baz=2 )。

View the source of the page using your browser and see the actual source of the input field. 使用浏览器查看页面源,并查看输入字段的实际来源。

If you are modifying the input field's value using Javascript, then the script must be double-encoding it. 如果要使用Javascript修改输入字段的值,则脚本必须对其进行双重编码。

BTW your program shouldn't segfault because of wrong user input ;) 顺便说一句,你的程序不应该因用户输入错误而导致段错误;)

You can use the DOM to decode the value: 您可以使用DOM来解码值:

function decodeHTMLSpecialChars(input){
  var div = document.createElement('div');
  div.innerHTML = input;
  return div.childNodes.length === 0 ? "" : div.childNodes[0].nodeValue;
}

This will render the following string: 这将呈现以下字符串:

'http://someurl.com/foo?bar=1&amp;baz=2'

to this: 对此:

decodeHTMLSpecialChars('http://someurl.com/foo?bar=1&amp;baz=2');
// => 'http://someurl.com/foo?bar=1&baz=2

And no, for HTML encoding and decoding, the htmlspecialchars and html escaping is the standard method and is doing the job just fine for you. 不,对于HTML编码和解码, htmlspecialchars和html转义是标准方法,并且正在为您完成工作。

Could you not just use the html_entity_decode function in PHPJS: 难道你不只是使用html_entity_decode在PHPJS功能:

http://phpjs.org/functions/html_entity_decode http://phpjs.org/functions/html_entity_decode

Other than that you could base64 encode your data instead... 除此之外,你可以对你的数据进行base64编码......

Please note that using htmlentities as it is doesn't help! 请注意,使用htmlentities并没有帮助!

By default it just encodes " < > & 默认情况下,它只编码" < > &

It doesn't escape ' which can create a problem! 它不会逃脱'这可能会产生问题!

Make sure you use Flags for the functions , you can find the usage and examples here 确保使用Flags作为函数,您可以在此处找到用法和示例

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM