简体   繁体   English

如何在java中从编码中查找语言环境

[英]How to find locale from encoding in java

I have a component that should be able to parse and process any xml file given by a user. 我有一个组件应该能够解析和处理用户给出的任何xml文件。 The xml file can contain Timestamp values like "12 March 2012 05:00 pm". xml文件可以包含时间戳值,例如“2012年3月12日下午05:00”。 So the user has to give the Timestamp pattern that is acceptable to SimpleDataFormat. 因此,用户必须提供SimpleDataFormat可接受的Timestamp模式。 We use the pattern and the SimpleDateFormat to parse the Timestamp values like this: 我们使用模式和SimpleDateFormat来解析Timestamp值,如下所示:

 SimpleDateFormat sdt = new SimpleDateFormat(inputTimestampPattern);
 Date date = sdt.parse(inputTimestampString);

But we are getting ParseException like below for one specific file. 但是我们正在获得一个特定文件的ParseException,如下所示。

java.text.ParseException: Unparseable date: " 04-6\埖 -12 18.54:57.169000 \和\怜" java.text.ParseException:Unparseable date:“04-6 \\ u57d6 -12 18.54:57.169000 \\ u548c \\ u601c”

We got this exception when we ran the component in Japanese locale with an input file Containing Timestamp pattern in Chinese locale. 当我们在日语语言环境中使用包含中文语言环境中的时间戳模式的输入文件运行组件时,我们遇到了此异常。 The JVM's locale is Japanese, so the SimpleDateFormat tries to parse the timestamp string assuming Japanese Locale and fails. JVM的语言环境是日语,因此SimpleDateFormat尝试解析假定日语语言环境并失败的时间戳字符串。 The xml file has the encoding information like this: xml文件具有如下编码信息:

  <?xml version="1.0" encoding="gbk"?>

If we somehow figure out the Locale from the encoding value then we can create Locale sensitive SimpleDateFormat object which would fix this issue. 如果我们以某种方式从编码值中找出Locale,那么我们可以创建Locale敏感的SimpleDateFormat对象来解决这个问题。 So my question is can we get Locale information from the encoding? 所以我的问题是我们可以从编码中获取Locale信息吗? I'm not asking for the exact Locale. 我不是要求确切的Locale。 Even if there is a way to get small set of possible Locales given an encoding, I can try all of them until one of them doesn't throw the Exception. 即使有一种方法可以获得一小组可能的Locales给定一个编码,我可以尝试所有这些,直到其中一个没有抛出异常。 Is there any API in Java that helps here? Java中是否有任何API有帮助?

Or is there any better way to address this issue? 或者有没有更好的方法来解决这个问题?

If the encoding will set in the first line of XML you can read the file first, obtaining only the first line, so will will catch the "encoding="gbk"" or whatever. 如果编码将在XML的第一行中设置,您可以首先读取文件,只获取第一行,因此将捕获“encoding =”gbk“”或其他任何内容。 And the set the encoding in the program with a Switch-case or however you want 并使用Switch-case设置程序中的编码,或者您想要的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM