[英]Databricks - Error writing to Azure Synapse
I am trying to write data to Azure Synapse table with identity field with following code我正在尝试使用以下代码将数据写入 Azure Synapse 表,其中包含身份字段
code on databricks数据块上的代码
def get_jdbc_connection(host, sqlDatabase, user, password):
jdbcHostname = "{}.database.windows.net".format(host)
jdbc_url = "jdbc:sqlserver://{}:1433;database={};user={}@{};password={};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;".format(jdbcHostname, sqlDatabase, user, host, password)
url = "jdbc:sqlserver://{}:1433;database={};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;".format(jdbcHostname, sqlDatabase)
return (jdbc_url,url )
def write_adw(spark, df_target_adw, jdbc_url, table, tempDir, option_mode, pre_Actions ):
df_target_adw.write.format("com.databricks.spark.sqldw") \
.option("url", jdbc_url) \
.option("useAzureMSI", "true") \
.option("preActions", pre_Actions) \
.option("dbTable", table) \
.option("tempDir", tempDir) \
.mode(option_mode) \
.save()
dftraffic = spark.sql('SELECT distinct SourceName\
,1 AS IsActiveRow \
,"Pipe-123" as pipelineId \
,current_timestamp as ADFCreatedDateTime \
,current_timestamp as ADFModifiedDateTime \
from deltaTable')
#write to ADW
(jdbc_url, url_adw) = get_jdbc_connection(host, sqlDatawarehouse,user, password)
target_table = 'TargetTable_name'
option_mode= "append"
pre_Actions= " SELECT GETDATE()"
write_adw(spark, dftraffic, jdbc_url, target_table, tempDir, option_mode, pre_Actions )
schema of target table on adw adw上目标表的架构
Column Name![]() |
Data Type![]() |
---|---|
SourceSID![]() |
INT IDENTITY (1,1) NOT NULL ![]() |
Source Name![]() |
VARCHAR(20) NOT NULL ![]() |
IsRowActive ![]() |
BIT NOT NULL![]() |
PipelineId![]() |
VARCHAR(20) NOT NULL ![]() |
ADFCreatedDateTime ![]() |
DATETIME NOT NULL![]() |
ADFModifiedDateTime ![]() |
DATETIME NOT NULL![]() |
Configuration details on databricks databricks 上的配置详细信息
Databricks runtime 7.4 (includes Apache Spark 3.0.1, Scala 2.12)
Databricks 运行时 7.4(包括 Apache Spark 3.0.1、Scala 2.12)
Error Message错误信息
Py4JJavaError: An error occurred while calling o457.save.
Py4JJavaError:调用 o457.save 时出错。 : com.databricks.spark.sqldw.SqlDWSideException: Azure Synapse Analytics failed to execute the JDBC query produced by the connector.
: com.databricks.spark.sqldw.SqlDWsideException: Azure Synapse Analytics 无法执行连接器生成的 JDBC 查询。 Underlying SQLException(s): - com.microsoft.sqlserver.jdbc.SQLServerException: An explicit value for the identity column in table can only be specified when a column list is used and IDENTITY_INSERT is ON
基础 SQLException(s): - com.microsoft.sqlserver.jdbc.SQLServerException: 只有在使用列列表并且 IDENTITY_INSERT 为 ON 时,才能为表中的标识列指定显式值
the code was working all fine on databricks runtime 6.4 Spark 2.4.5 and I am facing this error as soon I tried to upgrade dbk runtime.该代码在 databricks 运行时 6.4 Spark 2.4.5 上运行良好,我在尝试升级 dbk 运行时时遇到了这个错误。 How can I get this to work?
我怎样才能让它工作?
Are you not having the extra row "1 AS IsActiveRow ".I dont see that in the Schema您是否没有多余的行“1 AS IsActiveRow”。我在架构中没有看到
dftraffic = spark.sql('SELECT distinct SourceName\
,1 AS IsActiveRow \
,"Pipe-123" as pipelineId \
,current_timestamp as ADFCreatedDateTime \
,current_timestamp as ADFModifiedDateTime \
from deltaTable)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.