Facing Issue in writing PySpark Dataframe to Azure Synapse

Question

I have a PySpark dataframe in Azure Databricks. I want to write into Azure Synapse. But i am getting below error.

com.microsoft.sqlserver.jdbc.SQLServerException: The statement failed. Column &#39;ETL_TableName&#39; has a data type that cannot participate in a columnstore index.

I checked connection for Synapse.All works fine and i am able to read the data. But While writing, i am getting issue. Could anyone please help how to handle this error.

Code For Writing data into Synapse:

dataFrame.repartition(1).write.format("jdbc")\
         .option("url", azureurl)\
         .option("tempDir", tempDir) \
         .option("forwardSparkAzureStorageCredentials", "true") \
         .option("dbTable", dbTable)\
         .option("append", "true")\
         .save()

Answer 1

Couple of things needs to be changed.

Format should be .format("jdbc") => .format("com.databricks.spark.sqldw") .

Add this option "tableOptions" clause to your write statement. It takes the place of the with() clause of the CREATE TABLE (AS) statement:

.option ("tableOptions","heap,distribution=MY_DISTRIBUTION")

Code should looks like this:

dataFrame.repartition(1).write.format("com.databricks.spark.sqldw")\
         .option("tableOptions","heap,distribution=HASH(rownum)")
         .option("url", azureurl)\
         .option("tempDir", tempDir) \
         .option("forwardSparkAzureStorageCredentials", "true") \
         .option("dbTable", dbTable)\
         .option("append", "true")\
         .save()

Reference:

Azure Databricks - Azure Synapse Analytics

Choose a value for MY_DISTRIBUTION based on the following guidance:

Guidance for designing distributed tables in Synapse SQL pool

Answer 2

This is not be the exact answer of above issue. But it may help someone to get over. "I still have no clue about reason behind above issue". But i noticed this issue is coming when trying to write into Azure Synapse Warehouse. As i don't have any strict reason for sticking to Synapse Warehouse and whole priority was to write data from Databricks to Azure in structured format, i replace Azure Synapse warehouse with Azure SQL Server database.And its working much better.Will update the answer once i found the actual reason behind the issue.

Facing Issue in writing PySpark Dataframe to Azure Synapse

Question

Code For Writing data into Synapse:

2 answers

solution1
1 2020-07-03 09:00:36

solution2
0 2020-07-06 13:56:08

Facing Issue in writing PySpark Dataframe to Azure Synapse

Question

Code For Writing data into Synapse:

2 answers

solution1 1 2020-07-03 09:00:36

solution2 0 2020-07-06 13:56:08

solution1
1 2020-07-03 09:00:36

solution2
0 2020-07-06 13:56:08