spark.write.synapsesql options with Azure Synapse Spark Pool

Question

Within Azure Synapse, I am using the synapsesql function with the Scala language within a Spark Pool notebook to push the contents of a data frame into the SQL Pool

// Write data frame to sql table
df2.write.
option(Constants.SERVER,s"${pServerName}.sql.azuresynapse.net").
synapsesql(s"${pDatabaseName}.xtr.${pTableName}",Constants.INTERNAL)

This works successfully, however I would like to add some extra functionality

How do I specify the index type to be HEAP instead of Clustered Column store Index. In Data Bricks it can be done through.option("tableOptions","heap,distribution=ROUND-ROBIN") however this doesn't work in the Spark Pool notebook.
How do I overwrite the table in the SQL Pool if it already exists? In Data Bricks you have SaveAsTable but I can't find anything similar in the Spark pool notebook.

Answer 1

Anyways, as per this official document ,

When a table is created, by default the data structure has no indexes and is called a heap.

To use the HEAP type index, while creating the table simply specify HEAP in the WITH clause:

CREATE TABLE myTable
  (  
    id int NOT NULL,  
    lastName varchar(20),  
    zipCode varchar(6)  
  )  
WITH ( HEAP );

To overwrite, import org.apache.spark.sql.SaveMode Class in your SQL Notebook and use Overwrite mode.

Overwrite mode means that when saving a DataFrame to a data source, if data.table already exists, existing data is expected to be overwritten by the contents of the DataFrame.

Refer DataFrame write SaveMode support to know more.

Also check this given example for your reference.

spark.write.synapsesql options with Azure Synapse Spark Pool

Question

1 answers

solution1
0 ACCPTED 2022-04-07 08:28:10

spark.write.synapsesql options with Azure Synapse Spark Pool

Question

1 answers

solution1 0 ACCPTED 2022-04-07 08:28:10

solution1
0 ACCPTED 2022-04-07 08:28:10