Did anyone had luck connecting from Azure Synapse Analytics provisioned Spark Pool to Azure SQL Database?
Problem 1:
I've uploaded the Spark SQL Connector https://github.com/microsoft/sql-spark-connector as a Workspace Library and linked to the Spark Pool. Installing this causes an error when trying to start Spark Pool Session. I get a Livy error. In the Monitor section the error is:
This application failed due to the total number of errors: 1.
Error code 1
LIBRARY_MANAGEMENT_FAILED
Message
[...] Cleaning up the Spark service job because the cluster has failed.
Edit: This actually works on another Spark Pool for no reason. I don't know the root cause but I was able to run this on another pool.
Problem 2: I'm trying to use the TokenLibrary with Azure SQL Linked Service. This code:
conn = TokenLibrary.getConnectionString("MyAzureSQLDev")
print(conn)
Displays something that looks like Base64-encrypted JWT token plus some unknown characters. This is not a connection string.
I am looking for any working solution.
TokenLibrary.getConnectionString("MyAzureSQLDev")
returns the access token of the workspace identity (MSI). To use the token and write to the database I uploaded sql-spark-connector
to workspace packages and wrote this code:
df.write.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", "jdbc:sqlserver://%s.database.windows.net:%d" % (sql_server_name, db_port)) \
.option("dbtable", db_table) \
.option("accessToken", mssparkutils.credentials.getConnectionStringOrCreds("MyAzureSQLDev")) \
.option("encrypt", "true") \
.option("databaseName", db_name) \
.option("hostNameInCertificate", "*.database.windows.net") \
.mode("append") \
.save()
The workspace identity has to be added to Azure SQL Database this way:
CREATE USER [your workspace identity] FROM EXTERNAL PROVIDER;
However...
...the currently released version of sql-spark-connector
(version 1.0.1 from November 2020; see here ) is not compatible with the current version of Spark 2.4 used by Azure Synapse Analytics. The primary problem is with the version of SQL Server driver - Spark 2.4 on Azure Synapse provides version 8.4.1.jre8, whereas spark-mssql-connector:1.0.1
depends on version 7.2.1.jre8. Hence, installing spark-mssql-connector:1.0.1
on Azure Synapse and running the code above yields NoSuchMethodError
when writing batches of data to the database.
Although spark-mssql-connector
has not been released in a couple of months, it is still in active development and proper support for Spark 2.4 on Azure Synapse has been added in March 2021. I built the latest version from source and used the produced jar instead of the one on the Maven repo.
Just to update @mateharu 's answer, the following works in Synapse "out of the box" as of December 2021:
sql_server_name = "SOMETHING"
db_port = 1433
db_table = "SOMETHING"
db_name = "SOMETHING"
linked_service_name = "LINKEDSERVICENAME"
access_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_name)
# Write
df.write.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", "jdbc:sqlserver://%s.database.windows.net:%d" % (sql_server_name, db_port)) \
.option("dbtable", db_table) \
.option("accessToken", access_token) \
.option("encrypt", "true") \
.option("databaseName", db_name) \
.option("hostNameInCertificate", "*.database.windows.net") \
.mode("append") \
.save()
# Read
df2 = spark.read.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", "jdbc:sqlserver://%s.database.windows.net:%d" % (sql_server_name, db_port)) \
.option("dbtable", db_table) \
.option("accessToken", access_token) \
.option("encrypt", "true") \
.option("databaseName", db_name) \
.option("hostNameInCertificate", "*.database.windows.net") \
.load()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.