简体   繁体   中英

Prevent Javascript decoding encoded HTML

Let me start by saying Javascript is not my strong point, and all of the searches I have done for information on this topic have resulted in how to deal with url encode/decoding strings.

I'm having trouble with some code similar to the following:

<a href="#" onclick="<?php echo "alert('&#039;');"; ?>">test</>

I would expect that since the value being passed to alert is url encoded, that when clicking the link an alert box would be shown with the value &#039; in it.

It turns out that because the it is between the quotes for the onclick, the browser is decoding ' to a single quote before executing. Basically resulting in the code being alert('''); which obviously breaks horribly.

The following works just fine.

<script>alert('&#039;');</script>

Firstly, is there a way to disable this behaviour, or a clever workaround? (I'm guessing not)

My current solution is to decode the html encoded string, apply slashes to quotes, and then re-encode it. Obviously not very elegant.

Better solutions would be much appreciated.

That's the expected behaviour. HTML entities in the HTML source code are automatically converted when the browser parses the attribute. This allows website developers to include special characters, such as quotes in an attribute, without breaking the page.

Use htmlspecialchars to get the desired effect:

<a href="#" onclick="<?php echo htmlspecialchars("alert('&#039;');"); ?>">test</a>

No, you have to do what you described, and for good reason: It's the onion layers thing.

Given your particular onion:

<a href="#" onclick="<?php echo "alert('&#039;');"; ?>">test</>

The first layer is PHP, which when done will send this to the browser:

<a href="#" onclick="alert('&#039;');">test</>

The next layer is the browser's HTML parser, which is responsible for all sorts of things, including creating DOM elements (and other kinds of nodes) and handling character entities. So the HTML parser creates an a element in memory:

+------------------------+
| a                      |
+------------------------+
| href: "#"              |
| onclick: "alert(''');" |
|                        |
+------------------------+

The next layer is the JavaScript execution. When the user clicks that a element, the browser passes the JavaScript engine the string contained by the onclick attribute, which the JavaScript engine must then parse — and of course, it throws a syntax error.

Each layer of this onion has its own grammar rules and such, and you have to code for the rules of each layer as of what things will look like when that layer sees the text.

This is because &#039; is decoded inside the HTML attribute. This is one reason you shouldn't put JavaScript inline in HTML.

您可以通过串联将HT​​ML实体一分为二:

<a href="#" onclick="alert('&#'+'039;');">test</a>​

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM