简体   繁体   English

DB2使用ALT_COLLATE UNICODE在非Unicode数据库上插入UTF-8字符

[英]DB2 insert UTF-8 characters on non unicode database with ALT_COLLATE UNICODE

I am trying to insert Chinese text in a DB2 Database but not working. 我正在尝试在DB2数据库中插入中文文本,但无法正常工作。

The database is configured by default as ANSI (en_US 819) (and it's a requirement for other applications that use the dame databse) ALT_COLLATE IDENTITY_16BIT is defined and UNICODE tables are created using CCSID UNICODE but unicode characters for Chinese or Korean are not inserted. 默认情况下,数据库配置为ANSI(en_US 819)(这对于使用dame数据库的其他应用程序是必需的)。定义了ALT_COLLATE IDENTITY_16BIT,并使用CCSID UNICODE创建了UNICODE表,但未插入中文或韩文的Unicode字符。

Example table: 表格示例:

CREATE TABLE LANGS (
    IDIOMA  char(2) NOT NULL,
    PAIS    char(2) NOT NULL,
    TRADUC  long varchar NOT NULL,
) CCSID UNICODE;

Example insert: 示例插入:

INSERT INTO LANGS (IDIOMA,PAIS,TRADUC) VALUES ('zh','TW','其他');

System Information: 系统信息:

  • Server: DB2 9.7 on Ubuntu 64bit (en_US) 服务器:Ubuntu 64位上的DB2 9.7(zh_CN)
  • Client: Windows 7 32bit (es_ES) Java 7 with db2jcc.jar 客户端:带有db2jcc.jar的Windows 7 32位(es_ES)Java 7

Example Java extract: Java提取示例:

Class.forName("com.ibm.db2.jcc.DB2Driver");

...

Properties props = new Properties();
props.setProperty("user", user);
props.setProperty("password", pass);
props.setProperty("DB2CODEPAGE", "1208");
props.setProperty("retrieveMessagesFromServerOnGetMessage", "true");

con = DriverManager.getConnection(url, props);

...

Statement statement = con.createStatement();
statement.execute(sql);

...
statement.close();
con.close();

DB cfg get DB CFG取得

DB2 Database locale configuration DB2数据库语言环境配置

Territorio de base de datos                             = en_US;
Página de códigos de base de datos                      = 819 
Conjunto de códigos de base de datos                    = iso8859-1 
Código de país/región de base de datos                  = 1 
Secuencia de clasificación de base de datos             = UNIQUE 
Orden de clasificación alternativo        (ALT_COLLATE) = IDENTITY_16BIT
Tamaño de página de base de datos                       = 4096

Statements are executed correctly and rows appears correctly in the database for: 语句正确执行,并且行正确显示在数据库中,用于:

  • en_GB en_GB
  • en_US zh_CN
  • es_ES es_ES
  • pt_PT pt_PT

but not for: 但不适用于:

  • cy_GB cy_GB
  • ko_KR ko_KR
  • zh_TW zh_TW

Insert from command line with db2cmd also does not work for this languages (Inserts but with only 1 byte. 从命令行使用db2cmd进行插入对于这种语言也不起作用(插入时只有1个字节。

Insert from command line in a Linux environment localized as zh_TW works . 在zh_TW 工作本地化的Linux环境中从命令行插入。 Insert from command line in a Linux environment localized as en_US.utf-8 works . 在Linux环境下的局部作为的en_US.UTF-8部作品命令行插入。

Never work on Java on these environments. 在这些环境中,切勿在Java上工作。


Using "X" as prefix form the VARCHAR field is not an option due some restrictions and the SQL works on two environments. 由于存在一些限制,因此不能选择使用“ X”作为VARCHAR字段的前缀,并且SQL可以在两种环境下工作。

I think it may be some encoding problem on Client, or server due to configuration, file or sql encoding. 我认为由于配置,文件或sql编码,这可能是客户端或服务器上的某些编码问题。


Update: 更新:

I tried also to load a UTF-8 file with the SQLs. 我也尝试用SQL加载UTF-8文件。 the file loads correctly and debugging the SQL with UTF-8 characters is correctly passed to the Statement but the result is the same. 该文件将正确加载,并且将带有UTF-8字符的SQL调试正确传递给了Statement,但结果是相同的。

new InputStreamReader(new FileInputStream(file),"UTF-8")

...

private void executeLineByLine(Reader reader) throws SQLException {
    StringBuffer command = new StringBuffer();
    try {
        BufferedReader lineReader = new BufferedReader(reader);
        String line;
        while ((line = lineReader.readLine()) != null) {
            command = handleLine(command, line);
        }
        checkForMissingLineTerminator(command);
    } catch (Exception e) {
        String message = "Error executing: " + command + ".  Cause: " + e;
        printlnError(message);
        throw new SQLException(message, e);
    }
}


private StringBuffer handleLine(StringBuffer command, String line) throws SQLException, UnsupportedEncodingException {
    String trimmedLine = line.trim();
    if (lineIsComment(trimmedLine)) {
        println(trimmedLine);
    } else if (commandReadyToExecute(trimmedLine)) {
        command.append(line.substring(0, line.lastIndexOf(delimiter)));
        command.append(LINE_SEPARATOR);
        println(command);
        executeStatement(command.toString());
        command.setLength(0);
    } else if (trimmedLine.length() > 0) {
        command.append(line);
        command.append(LINE_SEPARATOR);
    }
    return command;
}

private void executeStatement(String command) throws SQLException, UnsupportedEncodingException {
    boolean hasResults = false;
    Statement statement = connection.createStatement();
    hasResults = statement.execute(command);
    printResults(statement, hasResults);
    statement.close();
}

Update2: 更新2:

It's not possible to change the data types. 不能更改数据类型。 The database is part of other systems and already with data. 该数据库是其他系统的一部分,并且已经包含数据。

The database is installed on 7 different servers on three of it that the data is inserted using Linux in a UTF-8 shell the data was inserted correctly from db2 command line. 该数据库安装在7台不同的服务器上,其中3台使用Linux在UTF-8 shell中插入数据,并且已从db2命令行正确插入了数据。

From windows db2 command line or using Java it's not possible to insert the characters correctly. 从Windows db2命令行或使用Java,无法正确插入字符。

Changing the Java sources to UTF-8 source makes the System.out prints the SQL correctly like i see debugging the sql variable. 将Java源更改为UTF-8源可以使System.out正确打印SQL,就像我看到调试sql变量一样。

When i insert this test SQL. 当我插入此测试SQL。 It is shown correctly with chines characters in the System.out and in the Statement internal variable 可以在System.out和Statement内部变量中正确显示中国字符

INSERT INTO LANGS (IDIOMA,PAIS,TRADUC) VALUES ('zh','TW','TEST1 其他 FIN TEST1');

But in the database the test appears as: 但是在数据库中,测试显示为:

TEST3  FIN TEST3

HEX reprentation: 十六进制表示:

54 45 53 54 33 20 1A 1A 1A 1A 1A 1A 1A 1A 20 46 49 4E 20 54 45 53 54 33
T  E  S  T  3  _  ?  ?  ?  ?  ?  ?  ?  ?  _  F  I  N  _  T  E  S  T  3

I think that probably DB2 Java client is using allways Windows codepage (in this case is ISO-8859-1 or cp1252) instead of UTF-8 or the server is converting the data using the main collate instead the alternative collate of the table. 我认为DB2 Java客户机可能一直使用Windows代码页(在这种情况下为ISO-8859-1或cp1252)而不是UTF-8,或者服务器正在使用主整理而不是表的替代整理来转换数据。

Update3: 更新3:

I installed a Java SQL tool called DbVisualizer and using this tool on windows when a paste in the SQL panel the SQL and run it is inserted correctly in the databse. 我安装了一个名为DbVisualizer的Java SQL工具,并在Windows上使用此工具,当在SQL面板中粘贴SQL并运行SQL并将其正确插入数据库中时。

This makes me to suspect that is not a problem of installation or data types. 这使我怀疑这不是安装或数据类型的问题。 Probably are one of this three factors. 可能是这三个因素之一。

  • Client configuration 客户端配置
  • Server properties sended when client connects 客户端连接时发送的服务器属性
  • Driver type of version used 驱动程序使用的版本类型

Problem is solved using these steps: 使用以下步骤解决了问题:

  1. Use always db2jcc4.jar not db2jcc.jar (JDBC 4) 始终使用db2jcc4.jar而不是db2jcc.jar(JDBC 4)

    • (In some places JDBC level 2 was configured in the OS classpath with db2jcc instead DB2jcc4 ) (在某些地方,JDBC级别2是在OS类路径中使用db2jcc而不是DB2jcc4配置的)
  2. Set the environment variable DISABLEUNICODE=0 设置环境变量DISABLEUNICODE = 0

There is a complete information in this page Understanding DB2 Universal Database character conversion about unicode on DB2 在此页面中有完整的信息了解关于DB2上unicode的DB2通用数据库字符转换

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM