简体   繁体   中英

How to replace a non English character with English character

I have got a weird problem. I'm getting text from Google cloud vision containing non English characters but they are actually English characters. It is a mistake from Google cloud vision OCR.

I'm getting a character like this: Héllo

Notice that é is non English character.

I want to convert into simple "Hello" so I can process this word.

I'm not looking for the programming answer. I'm just looking for ways to do this.

Any hint would be useful.

Thanks!

If Apache Commons is an option for you, you could make use of their StringUtils library. The stripAccents method should suit your needs. From the source code you can see that it actually makes use of java.text.Normalizer , so you could also look into that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM