简体   繁体   English

如何处理天蓝色突触表中 varchar not null 列中的空白以激发数据块

[英]How to handle white spaces in varchar not null column from azure synapse table to spark databricks

I have a problem when I read a table in spark (using azure databricks) from synapse database.当我从突触数据库读取 spark 中的表(使用 azure databricks)时遇到问题。 The table is defined as follow:该表定义如下:

CREATE TABLE A
(
    [ID] [int] NOT NULL,
    [Value] [int] NOT NULL,
    [Description] [nvarchar](30) NOT NULL,
    
)

The field Description can be blank (ie "" ) or can contains a blank space.字段Description可以为空白(即"" )或可以包含空格。 In synapse I have no problem with this field and neither when I read the table with spark putting it into a dataframe.在突触中,我对这个字段没有任何问题,当我用 spark 将它放入数据帧时读取表时也没有问题。 The problem raises when I write something like df.show() or df.count() .当我编写df.show()df.count()类的东西时,问题就出现了。 It appears the following error:出现以下错误:

com.databricks.spark.sqldw.SqlDWSideException: Azure Synapse Analytics failed to execute the JDBC query produced by the connector.

Py4JJavaError: An error occurred while calling o1779.showString.
: com.databricks.spark.sqldw.SqlDWSideException: Azure Synapse Analytics failed to execute the JDBC query produced by the connector.

Underlying SQLException(s):
  - com.microsoft.sqlserver.jdbc.SQLServerException: Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows processed.
Column ordinal: 2, Expected data type: NVARCHAR(30) collate SQL_Latin1_General_CP1_CI_AS NOT NULL. [ErrorCode = 107090] [SQLState = S0001]

Disclaimer: As it's been four months, I assume you've likely solved this or have a workaround.免责声明:由于已经四个月了,我假设您可能已经解决了这个问题或有一个解决方法。

I had the same problem and this is a bug in how Databricks is handling nulls vs empty strings when reading from Synapse.我遇到了同样的问题,这是 Databricks 在从 Synapse 读取时如何处理空值与空字符串的错误。 The quick fix is to set your Synapse tables to allow nulls (ie change NOT NULL to NULL).快速修复是将您的 Synapse 表设置为允许空值(即将 NOT NULL 更改为 NULL)。 Though empty strings are 100% valid in a Synapse field that is set to NOT NULL, for some reason Databricks is applying validation during full reads from Synapse that is breaking the read and causing the failure.尽管空字符串在设置为 NOT NULL 的 Synapse 字段中 100% 有效,但出于某种原因,Databricks 正在从 Synapse 进行完整读取期间应用验证,这会破坏读取并导致失败。 It's aware of the Synapse schema but improperly understanding and applying validation rules.它知道 Synapse 模式,但没有正确理解和应用验证规则。 You are only seeing this when you do a show() or count() because of Spark's lazy execution.由于 Spark 的懒惰执行,您只会在执行 show() 或 count() 时看到这一点。 Note, I am in the process of filing this bug with Databricks.请注意,我正在向 Databricks 提交此错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM