简体   繁体   English

如何防止Nokogiri编码HTML片段中的实体

[英]How to prevent Nokogiri from encoding entities in HTML fragments

Nokogiri 1.5.0 Nokogiri 1.5.0

I'm unable to output a parsed fragment with a link having query parameters, specifically with the ampersand in the href. 我无法输出带有查询参数的链接的解析片段,特别是href中的&符号。 The ampersand is replaced by its html entity. &符号由其html实体替换。

f = Nokogiri::HTML.fragment(%q{<a href="http://example.com?this=1&that=2">Testing</a>})
f.to_s    # => "<a href=\"http://example.com?this=1&amp;that=2\">Testing</a>"
f.to_html # => "<a href=\"http://example.com?this=1&amp;that=2\">Testing</a>"

No help using to_html(encoding: 'UTF-8') or US-ASCII. 没有帮助使用to_html(encoding: 'UTF-8')或US-ASCII。

This would seem pretty common, parsing a valid link format and wanting to render that back as valid HTML. 这似乎很常见,解析有效的链接格式并希望将其呈现为有效的HTML。

How to make Nokogiri transparently return un/encoded Html entities untouched? 如何使Nokogiri透明地返回未编码的Html实体? was no help. 没有帮助。

Nokogiri's HTML parser automatically corrects errors in the source document. Nokogiri的HTML解析器自动更正源文档中的错误。 The naked ampersand in the URL is actually an error , so Nokogiri is correcting it. URL中的裸露&符号实际上是一个错误 ,所以Nokogiri正在纠正它。 If you look at f.errors , you can see that it doesn't think that &that is a valid entity and is missing a semicolon, so it fixes the ampersand to &amp; 如果你看一下f.errors ,你可以看到它并不认为&that是一个有效的实体,缺少一个分号,所以它修复符号来&amp; , making it valid HTML. ,使其成为有效的HTML。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何防止web2py自动编码html实体? - How can I prevent web2py from automagically encoding html-entities? Nokogiri 阻止转换实体 - Nokogiri prevent converting entities 如何防止删除 <html> 在Nokogiri中标记? - How to prevent deletion of the <html> tag in Nokogiri? 如何从 React ContentEditable 提取没有 HTML 实体编码的文本? - How to extract text without HTML Entities encoding from React ContentEditable? 当输入中允许html实体时,如何防止html实体的双重编码 - How can one prevent double encoding of html entities when they are allowed in the input 如何使Nokogiri透明地返回未编码的Html实体? - How to make Nokogiri transparently return un/encoded Html entities untouched? 用Nokogiri解析带有奇怪编码的HTML - Parsing HTML with a weird encoding with Nokogiri 防止对现有HTML实体进行编码(转换为&但不包括&amp;) - Prevent encoding of existing HTML entities (convert & to &amp; but not &amp; to &amp;amp;) 角度构建结果会破坏html实体,如何防止这种情况发生 - Angular build result destroys html entities, how to prevent this from happening Ruby,Nokogiri:我如何在整个nokogiri解析,erb模板和编码HTML文件中确保UTF8 - Ruby, Nokogiri: how do i ensure UTF8 throughout nokogiri parsing, erb template, and encoding HTML file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM