简体   繁体   English

将具有Unicode字符(十进制格式(html编码))的字符串转换为常规字符串

[英]Convert a string with Unicode characters in decimal format(html encoded) to a regular string

I have one variable of type Map. 我有一个Map类型的变量。

if (sourceMap.containsKey(currentRow)) {
    //Remove the row from Map
} else {
    //Mismatch
}

where sourceMap is a Hashmap variable which contains many strings such as 其中sourceMap是一个Hashmap变量,其中包含许多字符串,例如

Period Name
Person Last Name
Person First Name
Order Code
Ship_to_Customer_Name
Sub_Profit_Center
Commission Amount
Credit Amount
Rate Amount
Apr-09
Morgan
Martin
1022334852
Carl Zeiss de M&# 195;&# 189;xico, S.A. de C.V.(no space after the # in the string)

and currentRow contains the following string: 并且currentRow包含以下字符串:

Carl Zeiss de Mýxico, S.A. de C.V.

which is same as the last row.My requirement is It should match. 与最后一行相同。我的要求是应匹配。 Now it is not matching.What conversion I have to do to match those 现在不匹配了,我该怎么做才能匹配那些

These strings are from different files first one is downloaded as CSV so no unicode characters..and second one(currentRow) is downloaded in unicoded txt format and converted to CSV using dos2unix.. 这些字符串来自不同的文件,第一个以CSV格式下载,因此没有Unicode字符。.第二个(currentRow)以未编码的txt格式下载,并使用dos2unix转换为CSV。

Carl Zeiss de Mýxico, S.A. de C.V.

This is a string with HTML-encoded characters in it. 这是一个带有HTML编码字符的字符串。 You can do an HTML-unescape using a utility function such as unescapeHtml4 . 您可以使用诸如unescapeHtml4类的实用程序函数来进行HTML- unescapeHtml4

Generally you want to keep your strings in raw form rather than with HTML-escapes in them. 通常,您希望将字符串保留为原始格式,而不是使用HTML转义符。 Look at wherever it is you got your sourceMap - if you control that and can fix it avoid the gratuitous escaping then this would have just worked. 看看无论您在哪里都可以找到sourceMap如果您控制了它并可以解决它,避免了不必要的转义,那么这将是sourceMap Note also that Mýxico , whether HTML-encoded or not, looks like evidence of mishandling Unicode characters somewhere else in the stack. 还要注意的是,无论是否经过HTML编码, Mýxico看起来像是在堆栈中其他地方误处理了Unicode字符的证据。

First we have to download the jar file from the following url http://www.java2s.com/Code/Jar/c/Downloadcommonlang3jar.htm 首先,我们必须从以下URL下载jar文件http://www.java2s.com/Code/Jar/c/Downloadcommonlang3jar.htm

Now add the import statement as follows 现在添加import语句,如下所示

import static org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4;

Now we need to use the method where we want to escape the html encoded string. 现在,我们需要在要转义html编码的字符串的地方使用该方法。 for eg: 例如:

String s=Carl Zeiss de Mýxico, S.A. de C.V.
System.out.println("Before: "+s);
s=unescapeHtml4(s);
System.out.println("After: "+s);

Now the output will be as follows 现在输出如下

Before: Carl Zeiss de Mýxico, S.A. de C.V. 
After: Carl Zeiss de Mýxico, S.A. de C.V.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM