简体   繁体   中英

How to prevent Nokogiri from encoding entities in HTML fragments

Nokogiri 1.5.0

I'm unable to output a parsed fragment with a link having query parameters, specifically with the ampersand in the href. The ampersand is replaced by its html entity.

f = Nokogiri::HTML.fragment(%q{<a href="http://example.com?this=1&that=2">Testing</a>})
f.to_s    # => "<a href=\"http://example.com?this=1&amp;that=2\">Testing</a>"
f.to_html # => "<a href=\"http://example.com?this=1&amp;that=2\">Testing</a>"

No help using to_html(encoding: 'UTF-8') or US-ASCII.

This would seem pretty common, parsing a valid link format and wanting to render that back as valid HTML.

How to make Nokogiri transparently return un/encoded Html entities untouched? was no help.

Nokogiri's HTML parser automatically corrects errors in the source document. The naked ampersand in the URL is actually an error , so Nokogiri is correcting it. If you look at f.errors , you can see that it doesn't think that &that is a valid entity and is missing a semicolon, so it fixes the ampersand to &amp; , making it valid HTML.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM