Connecting from Azure Synapse Analytics Spark Pool to Azure SQL Database

Question

Did anyone had luck connecting from Azure Synapse Analytics provisioned Spark Pool to Azure SQL Database?

Problem 1:

I've uploaded the Spark SQL Connector https://github.com/microsoft/sql-spark-connector as a Workspace Library and linked to the Spark Pool. Installing this causes an error when trying to start Spark Pool Session. I get a Livy error. In the Monitor section the error is:

This application failed due to the total number of errors: 1.
Error code 1
LIBRARY_MANAGEMENT_FAILED

Message
[...] Cleaning up the Spark service job because the cluster has failed.

Edit: This actually works on another Spark Pool for no reason. I don't know the root cause but I was able to run this on another pool.

Problem 2: I'm trying to use the TokenLibrary with Azure SQL Linked Service. This code:

conn = TokenLibrary.getConnectionString("MyAzureSQLDev")
print(conn)

Displays something that looks like Base64-encrypted JWT token plus some unknown characters. This is not a connection string.

I am looking for any working solution.

Answer 1

TokenLibrary.getConnectionString("MyAzureSQLDev") returns the access token of the workspace identity (MSI). To use the token and write to the database I uploaded sql-spark-connector to workspace packages and wrote this code:

df.write.format("com.microsoft.sqlserver.jdbc.spark") \
    .option("url", "jdbc:sqlserver://%s.database.windows.net:%d" % (sql_server_name, db_port)) \
    .option("dbtable", db_table) \
    .option("accessToken", mssparkutils.credentials.getConnectionStringOrCreds("MyAzureSQLDev")) \
    .option("encrypt", "true") \
    .option("databaseName", db_name) \
    .option("hostNameInCertificate", "*.database.windows.net") \
    .mode("append") \
    .save()

The workspace identity has to be added to Azure SQL Database this way:

CREATE USER [your workspace identity] FROM EXTERNAL PROVIDER;

However...

...the currently released version of sql-spark-connector (version 1.0.1 from November 2020; see here ) is not compatible with the current version of Spark 2.4 used by Azure Synapse Analytics. The primary problem is with the version of SQL Server driver - Spark 2.4 on Azure Synapse provides version 8.4.1.jre8, whereas spark-mssql-connector:1.0.1 depends on version 7.2.1.jre8. Hence, installing spark-mssql-connector:1.0.1 on Azure Synapse and running the code above yields NoSuchMethodError when writing batches of data to the database.

Although spark-mssql-connector has not been released in a couple of months, it is still in active development and proper support for Spark 2.4 on Azure Synapse has been added in March 2021. I built the latest version from source and used the produced jar instead of the one on the Maven repo.

Answer 2

Just to update @mateharu 's answer, the following works in Synapse "out of the box" as of December 2021:

sql_server_name = "SOMETHING"
db_port = 1433
db_table = "SOMETHING"
db_name = "SOMETHING"
linked_service_name = "LINKEDSERVICENAME"

access_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_name)

# Write
df.write.format("com.microsoft.sqlserver.jdbc.spark") \
    .option("url", "jdbc:sqlserver://%s.database.windows.net:%d" % (sql_server_name, db_port)) \
    .option("dbtable", db_table) \
    .option("accessToken", access_token) \
    .option("encrypt", "true") \
    .option("databaseName", db_name) \
    .option("hostNameInCertificate", "*.database.windows.net") \
    .mode("append") \
    .save()

# Read
df2 = spark.read.format("com.microsoft.sqlserver.jdbc.spark") \
    .option("url", "jdbc:sqlserver://%s.database.windows.net:%d" % (sql_server_name, db_port)) \
    .option("dbtable", db_table) \
    .option("accessToken", access_token) \
    .option("encrypt", "true") \
    .option("databaseName", db_name) \
    .option("hostNameInCertificate", "*.database.windows.net") \
    .load()

Connecting from Azure Synapse Analytics Spark Pool to Azure SQL Database

Question

2 answers

solution1
2 ACCPTED 2021-04-16 11:05:55

solution2
1 2021-12-20 14:44:42

Connecting from Azure Synapse Analytics Spark Pool to Azure SQL Database

Question

2 answers

solution1 2 ACCPTED 2021-04-16 11:05:55

solution2 1 2021-12-20 14:44:42

solution1
2 ACCPTED 2021-04-16 11:05:55

solution2
1 2021-12-20 14:44:42