简体   繁体   English

如何使用JSoup在HTML中将''更改为''

[英]How to change ' ' to ' ' in HTML using JSoup

I am using JSoup to parse an HTML file and removing elements that aren't valid in XML because I need to apply XSLT to the file. 我正在使用JSoup解析HTML文件,并删除在XML中无效的元素,因为我需要将XSLT应用于文件。 The issue I am running into is the "nbsp;" 我遇到的问题是“ nbsp;” that exist in my document. 存在于我的文档中。 I need to change them to unicode '#160;' 我需要将它们更改为unicode'#160;'。 so that I can run the XSLT on the file. 这样我就可以在文件上运行XSLT。

So I want: 所以我想要:

<p> &nbsp; </p> 
<p> &nbsp; </p> 
<p> &nbsp; </p> 
<p> &nbsp; </p> 

To Be: 成为:

<p> &#160; </p> 
<p> &#160; </p> 
<p> &#160; </p> 
<p> &#160; </p> 

I tried using a text replace but it didn't work: 我尝试使用文本替换,但是没有用:

Elements els = doc.body().getAllElements();
for (Element e : els) {
    List<TextNode> tnList = e.textNodes();
    for (TextNode tn : tnList){
        String orig = tn.text();
        tn.text(orig.replaceAll("&nbsp;","&#160;")); 
    }
}

Code that Performs the parsing: 执行解析的代码:

File f = new File ("C:/Users/jrothst/Desktop/Test File.htm");

Document doc = Jsoup.parse(f, "UTF-8");
doc.outputSettings().syntax( Document.OutputSettings.Syntax.xml );  
System.out.println("Starting parse..");
performConversion(doc);

String html = doc.toString();
System.out.println(html);
FileUtils.writeStringToFile(f, doc.outerHtml(), "UTF-8");

How can I make those changes happen using the JSoup libraries? 如何使用JSoup库进行这些更改?

The following worked for me. 以下对我有用。 You don't need to do any manual search and replace: 您无需手动搜索并替换:

File f = new File ("C:/Users/seanbright/Desktop/Test File.htm");

Document doc = Jsoup.parse(f, "UTF-8");
doc.outputSettings()
    .syntax(Document.OutputSettings.Syntax.xml)
    .escapeMode(Entities.EscapeMode.xhtml);

System.out.println(doc.toString());

Input: 输入:

<html><head></head><body>&nbsp;</body></html>

Output: 输出:

<html><head></head><body>&#xa0;</body></html>

( &#xa0; is the same thing as &#160; only in hexadecimal instead of decimal) &#xa0;&#160;相同&#160;只是用十六进制而不是十进制)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM