简体   繁体   中英

Facing Issue in writing PySpark Dataframe to Azure Synapse

I have a PySpark dataframe in Azure Databricks. I want to write into Azure Synapse. But i am getting below error.

com.microsoft.sqlserver.jdbc.SQLServerException: The statement failed. Column 'ETL_TableName' has a data type that cannot participate in a columnstore index. 

I checked connection for Synapse.All works fine and i am able to read the data. But While writing, i am getting issue. Could anyone please help how to handle this error.

Code For Writing data into Synapse:

dataFrame.repartition(1).write.format("jdbc")\
         .option("url", azureurl)\
         .option("tempDir", tempDir) \
         .option("forwardSparkAzureStorageCredentials", "true") \
         .option("dbTable", dbTable)\
         .option("append", "true")\
         .save()

Couple of things needs to be changed.

Format should be .format("jdbc") => .format("com.databricks.spark.sqldw") .

Add this option "tableOptions" clause to your write statement. It takes the place of the with() clause of the CREATE TABLE (AS) statement:

.option ("tableOptions","heap,distribution=MY_DISTRIBUTION")

Code should looks like this:

dataFrame.repartition(1).write.format("com.databricks.spark.sqldw")\
         .option("tableOptions","heap,distribution=HASH(rownum)")
         .option("url", azureurl)\
         .option("tempDir", tempDir) \
         .option("forwardSparkAzureStorageCredentials", "true") \
         .option("dbTable", dbTable)\
         .option("append", "true")\
         .save()

Reference:

Azure Databricks - Azure Synapse Analytics

Choose a value for MY_DISTRIBUTION based on the following guidance:

Guidance for designing distributed tables in Synapse SQL pool

This is not be the exact answer of above issue. But it may help someone to get over. "I still have no clue about reason behind above issue". But i noticed this issue is coming when trying to write into Azure Synapse Warehouse. As i don't have any strict reason for sticking to Synapse Warehouse and whole priority was to write data from Databricks to Azure in structured format, i replace Azure Synapse warehouse with Azure SQL Server database.And its working much better.Will update the answer once i found the actual reason behind the issue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM