如何防止Nokogiri编码HTML片段中的实体

Question

Nokogiri 1.5.0 Nokogiri 1.5.0

I'm unable to output a parsed fragment with a link having query parameters, specifically with the ampersand in the href. 我无法输出带有查询参数的链接的解析片段，特别是href中的＆符号。 The ampersand is replaced by its html entity. ＆符号由其html实体替换。

f = Nokogiri::HTML.fragment(%q{<a href="http://example.com?this=1&that=2">Testing</a>})
f.to_s    # => "<a href=\"http://example.com?this=1&amp;that=2\">Testing</a>"
f.to_html # => "<a href=\"http://example.com?this=1&amp;that=2\">Testing</a>"

No help using to_html(encoding: 'UTF-8') or US-ASCII. 没有帮助使用to_html(encoding: 'UTF-8')或US-ASCII。

This would seem pretty common, parsing a valid link format and wanting to render that back as valid HTML. 这似乎很常见，解析有效的链接格式并希望将其呈现为有效的HTML。

How to make Nokogiri transparently return un/encoded Html entities untouched? 如何使Nokogiri透明地返回未编码的Html实体？ was no help. 没有帮助。

Answer 1

Nokogiri's HTML parser automatically corrects errors in the source document. Nokogiri的HTML解析器自动更正源文档中的错误。 The naked ampersand in the URL is actually an error , so Nokogiri is correcting it. URL中的裸露＆符号实际上是一个错误，所以Nokogiri正在纠正它。 If you look at f.errors , you can see that it doesn't think that &that is a valid entity and is missing a semicolon, so it fixes the ampersand to & 如果你看一下f.errors ，你可以看到它并不认为&that是一个有效的实体，缺少一个分号，所以它修复符号来& , making it valid HTML. ，使其成为有效的HTML。

如何防止Nokogiri编码HTML片段中的实体

问题描述

1 个解决方案

解决方案1
4 已采纳 2012-03-01 17:44:42

如何防止Nokogiri编码HTML片段中的实体

问题描述

1 个解决方案

解决方案1 4 已采纳 2012-03-01 17:44:42

解决方案1
4 已采纳 2012-03-01 17:44:42