将包含ASCII的字符串转换为Unicode

Question

I get a string from my HTML page into my Java HTTPServlet. 我从HTML页面中将一个字符串输入到Java HTTPServlet中。 On my request I get ASCII codes that display Chinese characters: 根据我的要求，我得到了显示汉字的ASCII码：

"& #21487;& #20197;& #21578;& #35785;& #25105;" “＆＃21487;＆＃20197;＆＃21578;＆＃35785;＆＃25105;” (without the spaces) （无空格）

How can I transform this string into Unicode? 如何将该字符串转换为Unicode？

HTML code: HTML代码：

<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>Find information</title>
    <link rel="stylesheet" type="text/css" href="layout.css">
</head>
<body>

<form id="lookupform" name="lookupform" action="LookupServlet" method="post" accept-charset="UTF-8">
    <table id="lookuptable" align="center">
        <tr>
            <label>Question:</label>
            <td><textarea cols="30" rows="2" name="lookupstring" id="lookupstring"></textarea></td>
        </tr>
    </table>
    <input type="submit" name="Look up" id="lookup" value="Look up"/>
</form>

Java code: Java代码：

request.setCharacterEncoding("UTF-8");
javax.servlet.http.HttpSession session = request.getSession();
LoginResult lr = (LoginResult) session.getAttribute("loginResult");
String[] question = request.getParameterValues("lookupstring");

If I print question[0] then I get this value: "& #21487;& #20197;& #21578;& #35785;& #25105;" 如果我打印问题[0]，则将得到以下值：“＆＃21487;＆＃20197;＆＃21578;＆＃35785;＆＃25105;”

Answer 1

There is no such thing as ASCII codes that display Chinese characters. 没有显示中文字符的ASCII码之类的东西。 ASCII does not represent Chinese characters. ASCII不代表中文字符。

If you already have a Java string, it already has an internal representation of all characters (US, LATIN, CHINESE). 如果您已经有一个Java字符串，则它已经具有所有字符（美国，拉丁文，中文）的内部表示形式。 You can then encode that Java string into Unicode using UTF-8 or UTF-16 representations: 然后，您可以使用UTF-8或UTF-16表示形式将该Java字符串编码为Unicode：

~~String s = "可以告诉我"; String s =“可以告诉我”;~~ ( EDIT : This line won't display correctly on systems not having fonts for Chinese characters ) （编辑： 在没有汉字字体的系统上该行无法正确显示 ）

String s = "\u53ef\u4ee5\u544a\u8bc9\u6211";
byte utfString = s.getBytes("UTF-8");

Now that I look at your updated question, you might be looking for the StringEscapeUtils class. 现在，我查看了您的更新问题，您可能正在寻找StringEscapeUtils类。 It's from Apache Commons Text. 它来自Apache Commons Text。 And will unescape your HTML entities into a Java string: 并将您的HTML实体取消转义为Java字符串：

String s = StringEscapeUtils.unescapeHtml("& #21487;& #20197;& #21578;& #35785;& #25105;"); // without spaces

Answer 2

A Java String contains unicode characters. Java字符串包含Unicode字符。 The decoding has taken place when the string was constructed. 构造字符串时已进行解码。

将包含ASCII的字符串转换为Unicode

问题描述

2 个解决方案

解决方案1
5 已采纳 2010-12-24 12:06:18

解决方案2
0 2010-12-24 12:06:09

将包含ASCII的字符串转换为Unicode

问题描述

2 个解决方案

解决方案1 5 已采纳 2010-12-24 12:06:18

解决方案2 0 2010-12-24 12:06:09

解决方案1
5 已采纳 2010-12-24 12:06:18

解决方案2
0 2010-12-24 12:06:09