简体   繁体   中英

How to handle white spaces in varchar not null column from azure synapse table to spark databricks

I have a problem when I read a table in spark (using azure databricks) from synapse database. The table is defined as follow:

CREATE TABLE A
(
    [ID] [int] NOT NULL,
    [Value] [int] NOT NULL,
    [Description] [nvarchar](30) NOT NULL,
    
)

The field Description can be blank (ie "" ) or can contains a blank space. In synapse I have no problem with this field and neither when I read the table with spark putting it into a dataframe. The problem raises when I write something like df.show() or df.count() . It appears the following error:

com.databricks.spark.sqldw.SqlDWSideException: Azure Synapse Analytics failed to execute the JDBC query produced by the connector.

Py4JJavaError: An error occurred while calling o1779.showString.
: com.databricks.spark.sqldw.SqlDWSideException: Azure Synapse Analytics failed to execute the JDBC query produced by the connector.

Underlying SQLException(s):
  - com.microsoft.sqlserver.jdbc.SQLServerException: Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows processed.
Column ordinal: 2, Expected data type: NVARCHAR(30) collate SQL_Latin1_General_CP1_CI_AS NOT NULL. [ErrorCode = 107090] [SQLState = S0001]

Disclaimer: As it's been four months, I assume you've likely solved this or have a workaround.

I had the same problem and this is a bug in how Databricks is handling nulls vs empty strings when reading from Synapse. The quick fix is to set your Synapse tables to allow nulls (ie change NOT NULL to NULL). Though empty strings are 100% valid in a Synapse field that is set to NOT NULL, for some reason Databricks is applying validation during full reads from Synapse that is breaking the read and causing the failure. It's aware of the Synapse schema but improperly understanding and applying validation rules. You are only seeing this when you do a show() or count() because of Spark's lazy execution. Note, I am in the process of filing this bug with Databricks.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM