简体   繁体   中英

Connection from databricks to azure synapse

I'm new working with cloud services and I'm trying to make a connection between databricks and azure synapse. I have notebooks in databricks that generate data frames and I want to populate a Dedicated SQL pool inside synapse with them.

After looking at what the microsoft documentation recommends do and follow the steps, I came across this error.

code

df = spark.read \
  .format("com.databricks.spark.sqldw") \
  .option("url", <the-rest-of-the-connection-string>") \
  .option("forwardSparkAzureStorageCredentials", "true") \
  .option("dbTable", "Table") \
  .option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \
  .load()

error

Py4JJavaError: An error occurred while calling o1509.save.
: com.databricks.spark.sqldw.SqlDWConnectorException: Exception encountered in Azure Synapse Analytics connector code.

Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 14

Some considerations

  • I created a simple data frame for testing purposes assuming the problem could be the data frame
  • An empty table was not previously generated in synapse, I expect it to be created automatically

Could someone please help me understand this problem?

Follow below steps :

Configure Azure storage account

spark.conf.set(fs.azure.account.key.<your_storage_account>.blob.core.windows.net, “<your_storage_account_access_key>”)

Azure Synapse configuration

Database = <Database_Name>
Server = <Server_Name>
User = <Database_Username>
Pass = <Database_Password>
JdbcPort =  "1433"
JdbcExtraOptions = "encrypt=true;trustServerCertificate=true;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
sqlUrl = f"jdbc:sqlserver://{Server}:{JdbcPort};database={Database};user={User};password={Pass};${JdbcExtraOptions}"

Azure Data Lake Gen 2

tempDir = "abfss://<container>@<your_storage_account_name>.dfs.core.windows.net/<folder>"

Azure Synapse Table

tableName = <your_sql_table>

Read Data from Azure Synapse

df = spark.read \
  .format("com.databricks.spark.sqldw") \
  .option("url", sqlUrl) \
  .option("tempDir", tempDir) \
  .option("forwardSparkAzureStorageCredentials", "true") \
  .option("dbTable", tableName) \
  .load()

Reference:

How to connect to Azure Synapse in Azure Databricks by Joe

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM