简体   繁体   中英

javascript generating invalid HTML5 attributes in Firefox

I am noticing some very strange behavior in firefox and I'm wondering if anyone has a strategy for how to normalize or work around this behavior.

Specifically if you provide firefox a basic anchor containing html entities it will unescape those entities, fail to re-escape them and hand you back invalid html.

For example firefox mishandles the following url:

<a href="&gt;&lt;&quot;">My Original Link</a>

If this url is parsed by firefox it will unescape the &gt;&lt;&quot; and start handling a url like: <a href="<>"">My Original Link</a>

This same operation appears to work fine elsewhere, even safari and edge.

I tried quite a few different ways of handing the html to firefox to avoid this problem. Tried manually invoking the parser, tried setting innerHTML, tried jQuery html(), tried giving jQuery constructor a giant string, etc. All methods produced the same broken result.

See a fiddle here: https://jsfiddle.net/kamelkev/hfd2b6sn/

I am a little mystified by how broken this handling seems to be. There must be a way to work around this issue, but I can't seem to find a way.

My application is an html manipulation tool, so I typically normalize around issues like this by dropping down to XML and handling the problems there before persisting to a dumb key-value store, but in this particular case the <> characters are preventing me from processing this document as XML.

Ideas?

A < or a > is valid inside of an attribute value, unescaped. It's not best practice, but it is valid.

What's happening is that Firefox is parsing the original HTML and making elements out of it. At that point, the original HTML no longer exists. When you call .outerHTML , the HTML is reconstructed from the element.

Firefox then generates it using a different set of rules than Chrome does.

It isn't clear what exactly you need to do this for... really you should edit the DOM and export the HTML for the whole DOM when done. Constantly re-interpreting HTML isn't necessary.

The &gt; and &lt; are unescaped when the parser parses the source to construct the DOM. When you serialize an element back to a string, you are not guaranteed to obtain the same text as the source.

In this case,innerHTML andouterHTML use the HTML fragment serialization algorithm , which escapes attribute values using attribute mode:

Escaping a string (for the purposes of the algorithm above) consists of running the following steps:

  1. Replace any occurrence of the " & " character by the string " &amp; ".

  2. Replace any occurrences of the U+00A0 NO-BREAK SPACE character by the string " &nbsp; ".

  3. If the algorithm was invoked in the attribute mode , replace any occurrences of the """ character by the string " &quot; ".

  4. If the algorithm was not invoked in the attribute mode , replace any occurrences of the " < " character by the string " &lt; ", and any occurrences of the " > " character by the string " &gt; ".

That's why " is escaped to &quot; , but < and > remain.

This is OK, because < and > are allowed in HTML double-quoted attribute values :

However, XML does not allow < and > in attribute values. If you want to get valid XHTML, use a XML serializer:

 var s = new XMLSerializer(); var str = s.serializeToString(document.querySelector('a')); console.log(str);
 <a href="&gt;&lt;&quot;">My Original Link</a>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM