使用 Java 标准库将 HTML 字符转换回文本

Question

I would like to convert some HTML characters back to text using Java Standard Library.我想使用 Java 标准库将一些 HTML 字符转换回文本。 I was wondering whether any library would achieve my purpose?我想知道是否有任何图书馆可以达到我的目的？

/**
 * @param args the command line arguments
 */
public static void main(String[] args) {
    // TODO code application logic here

    // "Happy & Sad" in HTML form.
    String s = "Happy &amp; Sad";
    System.out.println(s);

    try {
        // Change to "Happy & Sad". DOESN'T WORK!
        s = java.net.URLDecoder.decode(s, "UTF-8");
        System.out.println(s);
    } catch (UnsupportedEncodingException ex) {

    }
}

Answer 1

I think the Apache Commons Lang library's StringEscapeUtils.unescapeHtml3() and unescapeHtml4() methods are what you are looking for.我认为 Apache Commons Lang 库的StringEscapeUtils.unescapeHtml3()和unescapeHtml4()方法正是您要找的。 See https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringEscapeUtils.html .请参阅https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringEscapeUtils.html 。

Answer 2

Here you have to just add jar file in lib jsoup in your application and then use this code.在这里，您只需在应用程序的 lib jsoup 中添加 jar 文件，然后使用此代码。

import org.jsoup.Jsoup;

public class Encoder {
    public static void main(String args[]) {
        String s = Jsoup.parse("&lt;Fran&ccedil;ais&gt;").text();
        System.out.print(s);
    }
}

Link to download jsoup: http://jsoup.org/download jsoup下载链接： http : //jsoup.org/download

Answer 3

java.net.URLDecoder deals only with the application/x-www-form-urlencoded MIME format (eg "%20" represents space), not with HTML character entities . java.net.URLDecoder只处理application/x-www-form-urlencoded MIME 格式（例如“%20”代表空格），而不处理HTML 字符实体。 I don't think there's anything on the Java platform for that.我认为 Java 平台上没有任何内容。 You could write your own utility class to do the conversion, like this one .您可以编写自己的实用程序类来进行转换，就像这样。

Answer 4

The URL decoder should only be used for decoding strings from the urls generated by html forms which are in the "application/x-www-form-urlencoded" mime type. URL 解码器应该只用于从“application/x-www-form-urlencoded”mime 类型的 html 表单生成的 url 中解码字符串。 This does not support html characters.这不支持 html 字符。

After asearch I found a Translate class within the HTML Parser library. 搜索后，我在HTML Parser库中找到了一个Translate类。

Answer 5

You can use the class org.apache.commons.lang.StringEscapeUtils:您可以使用类 org.apache.commons.lang.StringEscapeUtils：

String s = StringEscapeUtils.unescapeHtml("Happy &amp; Sad")

It is working.这是工作。

Answer 6

I'm not aware of any way to do it using the standard library.我不知道有什么方法可以使用标准库来做到这一点。 But I do know and use this class that deals with html entities.但我知道并使用这个处理 html 实体的类。

"HTMLEntities is an Open Source Java class that contains a collection of static methods (htmlentities, unhtmlentities, ...) to convert special and extended characters into HTML entitities and vice versa." “HTMLEntities 是一个开源 Java 类，它包含一组静态方法（htmlentities、unhtmlentities 等），用于将特殊字符和扩展字符转换为 HTML 实体，反之亦然。”

http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=htmlentities http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=htmlentities

Answer 7

As @jem suggested, it is possible to use jsoup.正如@jem 建议的那样，可以使用 jsoup。

With jSoup 1.8.3 it il possible to use the method Parser.unescapeEntities that retain the original html.使用 jSoup 1.8.3，可以使用保留原始 html 的Parser.unescapeEntities方法。

import org.jsoup.parser.Parser;
...
String html = Parser.unescapeEntities(original_html, false);

It seems that in some previous release this method is not present.似乎在某些以前的版本中不存在此方法。

Answer 8

Or you can use unescapeHtml4:或者你可以使用 unescapeHtml4：

    String miCadena="GU&#205;A TELEF&#211;NICA";
    System.out.println(StringEscapeUtils.unescapeHtml4(miCadena));

This code print the line: GUÍA TELEFÓNICA此代码打印以下行：GUÍA TELEFÓNICA

使用 Java 标准库将 HTML 字符转换回文本

问题描述

8 个解决方案

解决方案1
59 已采纳

解决方案2
28

解决方案3
7 2009-03-01 11:29:17

解决方案4
5

解决方案5
4

解决方案6
2

解决方案7
1

解决方案8
1

使用 Java 标准库将 HTML 字符转换回文本

问题描述

8 个解决方案

解决方案1 59 已采纳

解决方案2 28

解决方案3 7 2009-03-01 11:29:17

解决方案4 5

解决方案5 4

解决方案6 2

解决方案7 1

解决方案8 1

解决方案1
59 已采纳

解决方案2
28

解决方案3
7 2009-03-01 11:29:17

解决方案4
5

解决方案5
4

解决方案6
2

解决方案7
1

解决方案8
1