简体   繁体   中英

Databricks - Error writing to Azure Synapse

I am trying to write data to Azure Synapse table with identity field with following code

code on databricks

def get_jdbc_connection(host, sqlDatabase, user, password):
  jdbcHostname = "{}.database.windows.net".format(host)
  jdbc_url = "jdbc:sqlserver://{}:1433;database={};user={}@{};password={};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;".format(jdbcHostname, sqlDatabase, user, host, password)
  url = "jdbc:sqlserver://{}:1433;database={};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;".format(jdbcHostname, sqlDatabase)
  return (jdbc_url,url )

def write_adw(spark, df_target_adw, jdbc_url, table, tempDir, option_mode, pre_Actions ):
    df_target_adw.write.format("com.databricks.spark.sqldw") \
        .option("url", jdbc_url) \
        .option("useAzureMSI", "true") \
        .option("preActions", pre_Actions) \
        .option("dbTable", table) \
        .option("tempDir", tempDir) \
        .mode(option_mode) \
        .save()
    
dftraffic = spark.sql('SELECT distinct SourceName\
              ,1 AS IsActiveRow \
             ,"Pipe-123" as pipelineId \
              ,current_timestamp as ADFCreatedDateTime \
              ,current_timestamp as ADFModifiedDateTime \
              from deltaTable')

#write to ADW
(jdbc_url, url_adw) = get_jdbc_connection(host, sqlDatawarehouse,user, password)
target_table = 'TargetTable_name'
option_mode= "append"
pre_Actions= " SELECT GETDATE()"
write_adw(spark, dftraffic, jdbc_url, target_table, tempDir, option_mode, pre_Actions )

schema of target table on adw

Column Name Data Type
SourceSID INT IDENTITY (1,1) NOT NULL
Source Name VARCHAR(20) NOT NULL
IsRowActive BIT NOT NULL
PipelineId VARCHAR(20) NOT NULL
ADFCreatedDateTime DATETIME NOT NULL
ADFModifiedDateTime DATETIME NOT NULL

Configuration details on databricks

Databricks runtime 7.4 (includes Apache Spark 3.0.1, Scala 2.12)

Error Message

Py4JJavaError: An error occurred while calling o457.save. : com.databricks.spark.sqldw.SqlDWSideException: Azure Synapse Analytics failed to execute the JDBC query produced by the connector. Underlying SQLException(s): - com.microsoft.sqlserver.jdbc.SQLServerException: An explicit value for the identity column in table can only be specified when a column list is used and IDENTITY_INSERT is ON

the code was working all fine on databricks runtime 6.4 Spark 2.4.5 and I am facing this error as soon I tried to upgrade dbk runtime. How can I get this to work?

Are you not having the extra row "1 AS IsActiveRow ".I dont see that in the Schema

dftraffic = spark.sql('SELECT distinct SourceName\
              ,1 AS IsActiveRow \
             ,"Pipe-123" as pipelineId \
              ,current_timestamp as ADFCreatedDateTime \
              ,current_timestamp as ADFModifiedDateTime \
              from deltaTable)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM