简体   繁体   中英

How to find locale from encoding in java

I have a component that should be able to parse and process any xml file given by a user. The xml file can contain Timestamp values like "12 March 2012 05:00 pm". So the user has to give the Timestamp pattern that is acceptable to SimpleDataFormat. We use the pattern and the SimpleDateFormat to parse the Timestamp values like this:

 SimpleDateFormat sdt = new SimpleDateFormat(inputTimestampPattern);
 Date date = sdt.parse(inputTimestampString);

But we are getting ParseException like below for one specific file.

java.text.ParseException: Unparseable date: " 04-6\埖 -12 18.54:57.169000 \和\怜"

We got this exception when we ran the component in Japanese locale with an input file Containing Timestamp pattern in Chinese locale. The JVM's locale is Japanese, so the SimpleDateFormat tries to parse the timestamp string assuming Japanese Locale and fails. The xml file has the encoding information like this:

  <?xml version="1.0" encoding="gbk"?>

If we somehow figure out the Locale from the encoding value then we can create Locale sensitive SimpleDateFormat object which would fix this issue. So my question is can we get Locale information from the encoding? I'm not asking for the exact Locale. Even if there is a way to get small set of possible Locales given an encoding, I can try all of them until one of them doesn't throw the Exception. Is there any API in Java that helps here?

Or is there any better way to address this issue?

If the encoding will set in the first line of XML you can read the file first, obtaining only the first line, so will will catch the "encoding="gbk"" or whatever. And the set the encoding in the program with a Switch-case or however you want

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM