Databricks - 写入 Azure Synapse 时出错

Question

I am trying to write data to Azure Synapse table with identity field with following code我正在尝试使用以下代码将数据写入 Azure Synapse 表，其中包含身份字段

code on databricks数据块上的代码

def get_jdbc_connection(host, sqlDatabase, user, password):
  jdbcHostname = "{}.database.windows.net".format(host)
  jdbc_url = "jdbc:sqlserver://{}:1433;database={};user={}@{};password={};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;".format(jdbcHostname, sqlDatabase, user, host, password)
  url = "jdbc:sqlserver://{}:1433;database={};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;".format(jdbcHostname, sqlDatabase)
  return (jdbc_url,url )

def write_adw(spark, df_target_adw, jdbc_url, table, tempDir, option_mode, pre_Actions ):
    df_target_adw.write.format("com.databricks.spark.sqldw") \
        .option("url", jdbc_url) \
        .option("useAzureMSI", "true") \
        .option("preActions", pre_Actions) \
        .option("dbTable", table) \
        .option("tempDir", tempDir) \
        .mode(option_mode) \
        .save()
    
dftraffic = spark.sql('SELECT distinct SourceName\
              ,1 AS IsActiveRow \
             ,"Pipe-123" as pipelineId \
              ,current_timestamp as ADFCreatedDateTime \
              ,current_timestamp as ADFModifiedDateTime \
              from deltaTable')

#write to ADW
(jdbc_url, url_adw) = get_jdbc_connection(host, sqlDatawarehouse,user, password)
target_table = 'TargetTable_name'
option_mode= "append"
pre_Actions= " SELECT GETDATE()"
write_adw(spark, dftraffic, jdbc_url, target_table, tempDir, option_mode, pre_Actions )

schema of target table on adw adw上目标表的架构

Column Name列名	Data Type数据类型
SourceSID源SID	INT IDENTITY (1,1) NOT NULL INT 身份 (1,1) 不是 NULL
Source Name来源名称	VARCHAR(20) NOT NULL VARCHAR(20) 不是 NULL
IsRowActive IsRowActive	BIT NOT NULL位不 NULL
PipelineId管道编号	VARCHAR(20) NOT NULL VARCHAR(20) 不是 NULL
ADFCreatedDateTime ADFCreatedDateTime	DATETIME NOT NULL日期时间不是 NULL
ADFModifiedDateTime ADFModifiedDateTime	DATETIME NOT NULL日期时间不是 NULL

Configuration details on databricks databricks 上的配置详细信息

Databricks runtime 7.4 (includes Apache Spark 3.0.1, Scala 2.12) Databricks 运行时 7.4（包括 Apache Spark 3.0.1、Scala 2.12）

Error Message错误信息

Py4JJavaError: An error occurred while calling o457.save. Py4JJavaError：调用 o457.save 时出错。 : com.databricks.spark.sqldw.SqlDWSideException: Azure Synapse Analytics failed to execute the JDBC query produced by the connector. : com.databricks.spark.sqldw.SqlDWsideException: Azure Synapse Analytics 无法执行连接器生成的 JDBC 查询。 Underlying SQLException(s): - com.microsoft.sqlserver.jdbc.SQLServerException: An explicit value for the identity column in table can only be specified when a column list is used and IDENTITY_INSERT is ON基础 SQLException(s): - com.microsoft.sqlserver.jdbc.SQLServerException: 只有在使用列列表并且 IDENTITY_INSERT 为 ON 时，才能为表中的标识列指定显式值

the code was working all fine on databricks runtime 6.4 Spark 2.4.5 and I am facing this error as soon I tried to upgrade dbk runtime.该代码在 databricks 运行时 6.4 Spark 2.4.5 上运行良好，我在尝试升级 dbk 运行时时遇到了这个错误。 How can I get this to work?我怎样才能让它工作？

Answer 1

Are you not having the extra row "1 AS IsActiveRow ".I dont see that in the Schema您是否没有多余的行“1 AS IsActiveRow”。我在架构中没有看到

dftraffic = spark.sql('SELECT distinct SourceName\
              ,1 AS IsActiveRow \
             ,"Pipe-123" as pipelineId \
              ,current_timestamp as ADFCreatedDateTime \
              ,current_timestamp as ADFModifiedDateTime \
              from deltaTable)

Databricks - 写入 Azure Synapse 时出错

问题描述

1 个解决方案

解决方案1
0 2021-04-29 05:54:03

Databricks - 写入 Azure Synapse 时出错

问题描述

1 个解决方案

解决方案1 0 2021-04-29 05:54:03

解决方案1
0 2021-04-29 05:54:03