简体   繁体   中英

How to detect an appropriate String locale in Java

In current project I need to lowercase the incoming text, which can be passed in English / German / Turkish languages. Ordinary String#toLowerCase() fails for some characters of the Turkish alphabet because, for example, it is necessary to map non-ASCII character http://unicode-table.com/en/0130/ to ASCII http://unicode-table.com/en/0069/ . Java 7 handles this mapping without any issues in case I provide the locale, ie. str.toLowerCase(new Locale(“tr”)) is necessary. But this case it looks I should to detect the appropriate locale of given text, because it could be written on one of three possible languages.

Is there any way to perform the appropriate locale detection or is this way wrong?

EDIT 1

I didn't mention the actual use case, I'm adding tags to the entity via the REST API and I guess I'm not allowed to change the API contract..

There are libraries which use heuristics to detect a language with a certain probability. An example can be found here .

Probably there is a library that does this but I don't know such library. I can however offer you a simple solution.

There are several special characters in Turkish and German language. All other characters are plain English and therefore the problem is irrelevant for them. So, you can hold a list of special German and Turkish characters and detect the locale of current string by searching of these characters into the string. If one of Turkish characters is found in string consider it to be processed in Turkish locale, the same is for German. If no-one of special characters is found, use default locale.

This solution has some performance penalties because you are going to scan the string twice but this is not important for most applications.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM