简体   繁体   中英

spark.write.synapsesql options with Azure Synapse Spark Pool

Within Azure Synapse, I am using the synapsesql function with the Scala language within a Spark Pool notebook to push the contents of a data frame into the SQL Pool

// Write data frame to sql table
df2.write.
option(Constants.SERVER,s"${pServerName}.sql.azuresynapse.net").
synapsesql(s"${pDatabaseName}.xtr.${pTableName}",Constants.INTERNAL)

This works successfully, however I would like to add some extra functionality

  1. How do I specify the index type to be HEAP instead of Clustered Column store Index. In Data Bricks it can be done through.option("tableOptions","heap,distribution=ROUND-ROBIN") however this doesn't work in the Spark Pool notebook.

  2. How do I overwrite the table in the SQL Pool if it already exists? In Data Bricks you have SaveAsTable but I can't find anything similar in the Spark pool notebook.

Anyways, as per this official document ,

When a table is created, by default the data structure has no indexes and is called a heap.

To use the HEAP type index, while creating the table simply specify HEAP in the WITH clause:

CREATE TABLE myTable
  (  
    id int NOT NULL,  
    lastName varchar(20),  
    zipCode varchar(6)  
  )  
WITH ( HEAP );

To overwrite, import org.apache.spark.sql.SaveMode Class in your SQL Notebook and use Overwrite mode.

Overwrite mode means that when saving a DataFrame to a data source, if data.table already exists, existing data is expected to be overwritten by the contents of the DataFrame.

Refer DataFrame write SaveMode support to know more.

Also check this given example for your reference.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM