[英]How do I wrap HTML untagged text with <p> tag using Nokogiri?
I have to parse an HTML document into different new files. 我必须将HTML文档解析为不同的新文件。 The problem is that there are text nodes which have not been wrapped with
"<p>"
tags, instead they having "<br>"
tags at the end of each paragraph. 问题是有些文本节点没有用
"<p>"
标签包装,而是在每个段落的末尾都有"<br>"
标签。
I want to wrap this text with <p>
tags using Nokogiri: 我想用Nokogiri用
<p>
标签包装这个文本:
<div id="f15"><b>Footnote 15</b>: Catullus iii, 12.</div>
<div class="pgmonospaced pgheader"><br/>
<br/>
End of the Project abc<br/>
<br/>
*** END OF THIS PROJECT XYZ ***<br/>
<br/>
***** This file should be named new file.html... *****<br/>
<br/></div>
After searching around some forums and doing some debugging locally, i have found the following solution to my problem. 在搜索一些论坛并在本地进行一些调试之后,我找到了以下解决方案来解决我的问题。
html_doc = Nokogiri::HTML.parse('path/to/html_file')
html_doc
html_doc.search("//br/preceding-sibling::text()|//br/following-sibling::text()").each do |node|
node.replace(Nokogiri.make("<p>#{node.to_html}</p>"))
end
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.