简体   繁体   English

使用 java 将重音字符转换为英语

[英]Convert accent characters to english using java

I have a requirement where i need to search with accent characters that can be for users from Iceland and Japan .我有一个要求,我需要使用可用于IcelandJapan用户的重音字符进行搜索。 The code which i wrote works for a few accent characters but not all.我编写的代码适用于一些重音字符,但不是全部。 Below example -下面的例子 -

À - returns a. Correct.
 - returns a. Correct.
Ð - returns Ð. This is breaking. It should return e.
Õ - returns Õ. This is breaking. It should return o.

Below is my code:-以下是我的代码: -

String accentConvertStr = StringUtils.stripAccents(myKey);

Tried this too:-也试过这个: -

byte[] b = key.getBytes("Cp1252");
System.out.println("" + new String(b, StandardCharsets.UTF_8));

Please advise.请指教。

I would say it works as expected.我会说它按预期工作。 The underlying code of StringUtils.stripAccents is actually following. StringUtils.stripAccents 的底层代码其实如下。

String[] chars  = new String[]{"À","Â","Ð","Õ"};

for(String c : chars){
  String normalized = Normalizer.normalize(c,Normalizer.Form.NFD);
  System.out.println(normalized.replaceAll("\\p{InCombiningDiacriticalMarks}+", ""));
}

This will output: AA Ð O这将 output: AA Ð O

If you read https://stackoverflow.com/a/5697575/9671280 answer, you will find如果您阅读https://stackoverflow.com/a/5697575/9671280答案,您会发现

Be aware that that will not remove what you might think of as “accent” marks from all characters. There are many it will not do this for, For example. you cannot convert Đ to D or ø to o that way, For that. you need to reduce code points to those that match the same primary collation strength in the Unicode Collation Table.

You could handle it separately if you still want to use StringUtil.stripAccents.如果您仍想使用 StringUtil.stripAccents,则可以单独处理。

Please try https://github.com/xuender/unidecode it seems to work for your case.请尝试https://github.com/xuender/unidecode它似乎适用于您的情况。

 String normalized = Unidecode.decode(input);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM