简体   繁体   中英

Informix JDBC PySpark Read Results in Column Names as Column Values

I'm reading data from various JDBC sources using PySpark's read method. JDBC reads from Teradata, mySQL, Oracle, SQL Server are all working 100%, however, I'm now trying to read from Informix and the result is the column headers in the column values in stead of the actual data:

query_cbu = '''
SELECT first 5 
ac2_analysis_p
FROM informix.ac2_aux_cust
        '''

Specifying the header option did not help:

df_cbu = \
      spark.read.format("jdbc") \
      .option("url", url) \
      .option("dbtable", '({}) tbl'.format(query_cbu)) \
      .option("user", db_username) \
      .option("password", db_password) \
      .option("header", "true") \
      .load()

df_cbu.show()

Result:

+--------------+
|ac2_analysis_p|
+--------------+
|ac2_analysis_p|
|ac2_analysis_p|
|ac2_analysis_p|
|ac2_analysis_p|
|ac2_analysis_p|
+--------------+

Using the same jdbc driver (ifxjdbc.jar) values are returned correctly from DBVisualiser:

在此处输入图片说明

I can't imagine any mechanism that can cause this. Can anyone advise me where to start looking for the problem?

I do believe (and I saw this once before some time ago so going from memory here) that you need to enable DELIMIDENT in your JDBC driver URL.

DELIMIDENT=Y

https://www.ibm.com/support/knowledgecenter/en/SSGU8G_12.1.0/com.ibm.jdbc_pg.doc/ids_jdbc_040.htm#ids_jdbc_040

The reason is that while the other JDBC drivers already quote username/table names in the metadata that Spark goes after, Informix JDBC does not which confuses Sparks JDBC layer. Enabling DELIMIDENT in the driver adds those. There are other repercussions to using DELIMIDENT so make sure it does what you want, but it should be fine to turn it on.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM