简体   繁体   English

如何用英文字符替换非英文字符

[英]How to replace a non English character with English character

I have got a weird problem.我有一个奇怪的问题。 I'm getting text from Google cloud vision containing non English characters but they are actually English characters.我从 Google Cloud Vision 获取包含非英文字符的文本,但它们实际上是英文字符。 It is a mistake from Google cloud vision OCR.这是谷歌云视觉OCR的一个错误。

I'm getting a character like this: Héllo我得到一个这样的角色: Héllo

Notice that é is non English character.

I want to convert into simple "Hello" so I can process this word.我想转换成simple "Hello" ,这样我就可以处理这个词了。

I'm not looking for the programming answer.我不是在寻找编程答案。 I'm just looking for ways to do this.我只是在寻找方法来做到这一点。

Any hint would be useful.任何提示都会有用。

Thanks!谢谢!

If Apache Commons is an option for you, you could make use of their StringUtils library.如果 Apache Commons 适合您,您可以使用他们的 StringUtils 库。 The stripAccents method should suit your needs. stripAccents方法应该适合您的需要。 From the source code you can see that it actually makes use of java.text.Normalizer , so you could also look into that.从源代码中您可以看到它实际上使用了java.text.Normalizer ,因此您也可以查看它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM