简体   繁体   English

从Websphere 7升级到Websphere 8.5.5时的编码问题

[英]Encoding issue when upgrading from Websphere 7 to Websphere 8.5.5

We recently moved an application from WAS 7.0 (on AIX) to WAS 8.5.5 (on Linux). 我们最近将应用程序从WAS 7.0(在AIX上)迁移到WAS 8.5.5(在Linux上)。 It interfaces with a couple of applications that send data in the form of an xml 它与几个应用程序接口,这些应用程序以xml的形式发送数据

The XML is retrieved from the header using - 使用以下命令从标题中检索XML:

while ((i = request.getReader().read(buf, 0, buf.length)) != -1) {
            sb.append(buf, 0, i);
        }

However after transition, we noticed that the application was not handling special characters like è or © correctly - they are garbled. 但是,转换后,我们注意到该应用程序未正确处理è©等特殊字符-它们乱码。

This looks to me like an encoding issue. 在我看来,这似乎是一个编码问题。 Can anyone point on what needs to be checked to understand the root cause? 任何人都可以指出需要检查哪些内容以了解根本原因吗?

I was reading further on this and i see that I can set the JVM arguments to 我正在进一步阅读,我发现可以将JVM参数设置为

-Dclient.encoding.override=UTF-8

to always use UTF-8. 始终使用UTF-8。 Is this a good practice? 这是一个好习惯吗?

Edit : 编辑

Locale output in Linux

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Locale output on AIX
LANG=en_US
LC_COLLATE="en_US"
LC_CTYPE="en_US"
LC_MONETARY="en_US"
LC_NUMERIC="en_US"
LC_TIME="en_US"
LC_MESSAGES="en_US"
LC_ALL=

One application sends the xml as <?xml version="1.0" encoding="ISO-8859-1"?> and the other sends it as <?xml version="1.0"> 一个应用程序以<?xml version="1.0" encoding="ISO-8859-1"?>发送xml,另一个应用程序以<?xml version="1.0">

After setting the above mentioned JVM setting, the <?xml version="1.0"> is treated correctly but the one with the encoding set to ISO-8859-1 is not. 设置上述JVM设置后,将正确处理<?xml version="1.0"> ,但不会正确处理编码设置为ISO-8859-1的<?xml version="1.0"> I am totally lost here. 我在这里完全迷路了。

As it seems you application is not written to use a specific encoding and therefore uses the default of your session. 看起来您的应用程序未编写为使用特定的编码,因此使用了会话的默认设置。

Check on AIX and Linux the locale with locale . 使用locale在AIX和Linux上检查语言locale On Linux it might be something like LANG=en_US.UTF-8 . 在Linux上,可能类似于LANG=en_US.UTF-8

To let your application behave on Linux the same as on AIX set on Linux the locale to the same value as on AIX. 为了使您的应用程序在Linux上与在AIX上相同,在Linux上将语言环境设置为与AIX上相同的值。

Using unicode aware applications is not a bad idea in general. 通常,使用具有Unicode意识的应用程序不是一个坏主意。 But there are exceptions where you need to stick to another encoding, eg LATIN-1 for some legacy systems. 但是在某些情况下,您需要坚持使用另一种编码,例如对于某些旧系统,使用LATIN-1。 Then in your code you explicitly need to choose this encoding where it is needed. 然后,在您的代码中,您明确需要在需要的地方选择此编码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM