[英]How to change ' ' to ' ' in HTML using JSoup
I am using JSoup to parse an HTML file and removing elements that aren't valid in XML because I need to apply XSLT to the file. 我正在使用JSoup解析HTML文件,并删除在XML中无效的元素,因为我需要将XSLT应用于文件。 The issue I am running into is the "nbsp;"
我遇到的问题是“ nbsp;” that exist in my document.
存在于我的文档中。 I need to change them to unicode '#160;'
我需要将它们更改为unicode'#160;'。 so that I can run the XSLT on the file.
这样我就可以在文件上运行XSLT。
So I want: 所以我想要:
<p> </p>
<p> </p>
<p> </p>
<p> </p>
To Be: 成为:
<p>   </p>
<p>   </p>
<p>   </p>
<p>   </p>
I tried using a text replace but it didn't work: 我尝试使用文本替换,但是没有用:
Elements els = doc.body().getAllElements();
for (Element e : els) {
List<TextNode> tnList = e.textNodes();
for (TextNode tn : tnList){
String orig = tn.text();
tn.text(orig.replaceAll(" "," "));
}
}
Code that Performs the parsing: 执行解析的代码:
File f = new File ("C:/Users/jrothst/Desktop/Test File.htm");
Document doc = Jsoup.parse(f, "UTF-8");
doc.outputSettings().syntax( Document.OutputSettings.Syntax.xml );
System.out.println("Starting parse..");
performConversion(doc);
String html = doc.toString();
System.out.println(html);
FileUtils.writeStringToFile(f, doc.outerHtml(), "UTF-8");
How can I make those changes happen using the JSoup libraries? 如何使用JSoup库进行这些更改?
The following worked for me. 以下对我有用。 You don't need to do any manual search and replace:
您无需手动搜索并替换:
File f = new File ("C:/Users/seanbright/Desktop/Test File.htm");
Document doc = Jsoup.parse(f, "UTF-8");
doc.outputSettings()
.syntax(Document.OutputSettings.Syntax.xml)
.escapeMode(Entities.EscapeMode.xhtml);
System.out.println(doc.toString());
Input: 输入:
<html><head></head><body> </body></html>
Output: 输出:
<html><head></head><body> </body></html>
(  
is the same thing as  
only in hexadecimal instead of decimal) (
 
与 
相同 
只是用十六进制而不是十进制)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.