简体   繁体   中英

Connecting to Azure Synapse Serverless SQL using Python

I am trying to connect to Azure Serverless SQL using python but getting the following error

External table 'dbo.my_table' is not accessible because location does not exist or it is used by another process.

I have created the External table using following commands.

First, I created a database scope credential

CREATE DATABASE SCOPED CREDENTIAL python_access_credential WITH IDENTITY = 'SHARED ACCESS SIGNATURE', SECRET =  '<SAS TOKEN>';

Then I created the external data source with the above database scope credential using following command

CREATE EXTERNAL DATA SOURCE python_data WITH ( LOCATION ='abfss://<StorageAccount>.dfs.core.windows.net', CREDENTIAL= [python_access_credential]);

Then I created the external table

CREATE EXTERNAL TABLE my_table ([C1] bigint, [C2] nvarchar(4000))WITH (LOCATION ='data.tsv', DATA_SOURCE = [python_data],FILE_FORMAT = [SynapseDelimitedTextFormat])

Then, I ran the following command to grant access to the user used in my python script

GRANT REFERENCES ON DATABASE SCOPED CREDENTIAL::[python_access_credential] TO [user];

Also, same error occurs when I try to access the table via Synapse Studio

I had referred to many Azure documents, Microsoft Q&A pages and one other stack overflow question which I found , but couldn't find a solution. Most answers pointed towards using DATABASE SCOPE CREDENTIALS and using it in CREDENTIAL param while creating an external data source, which I have already done.

I suggest you create a linked dataset and use the GUI to do all this work for you. Today, I am going to walk thru the process for the [sales_lt] database and [address] table that is stored in the data lake using the parquet format.

在此处输入图像描述

There are two types of workspace databases you can create. If you are not using spark SQL, select the first type.

在此处输入图像描述

Click the plus icon and add a SQL database called [stack_overflow] to the serverless pools. This type of pool is great since you only get charged when you query the data.

在此处输入图像描述

We can see from the above image that are new database, [stack_overflow] now exists.

在此处输入图像描述

If you are familiar with Azure Data Factory, configure a linked service that connects to your data storage and browse to the file that you want to convert into a table. Right click the "create external table" menu option.

Please see MSDN article for "how to create a linked service".

https://docs.microsoft.com/en-us/azure/data-factory/concepts-linked-services?tabs=data-factory

在此处输入图像描述

It will ask you what is the default text field size - 4000 bytes.

在此处输入图像描述

Next, choose to create and open a script. There is a bug in the GUI interface. While, it states you can create a new schema and new table with in the data entry field, it does not create the code for the schema.

在此处输入图像描述

Just add lines 13/14 and change 16 to prefix the table with the schema called [sales_lt]. Please note, this creates the code that you showed above wo any real work on your part.

在此处输入图像描述

We need to add an AD user to the synapse workspace and give it rights to view the table. I am going to start with sysadmin rights. You can reduce the rights once you have it working.

在此处输入图像描述

Find the end point for the Synapse Server-less Pools. In short, the service looks like and acts like SQL server with the tabular data stream (TDS) as the format of the data.

We can use the latest driver from Microsoft and write code for python to connect to the database. We need to use Active Directory with password option. Synapse does not support non-active directory users.

Please see the MSDN documentation for how to download latest driver (native + odbc).

https://docs.microsoft.com/en-us/sql/connect/jdbc/connecting-using-azure-active-directory-authentication?view=sql-server-ver16

I am a DBA by trade, while I am crafty with other languages. I am going to test the connection using SQL Server Management Studio (SSMS).

在此处输入图像描述

Enter the fully qualified address to the Synapse Server-less Pool. Add the user, dilbert@craftydba.com and the correct password.

在此处输入图像描述

The advance tab allows you to make connection changes. Make sure TCP/IP is used as the network protocol and the default database is [stack_overflow].

在此处输入图像描述

Last but not least, the select statement retrieves data from the Synapse Server-less pool and returns it to the client, SSMS. In short, the solution works. Now, we just need to code the connection and data retrieval using python.

Here is a link on how to use the latest ODBC driver for SQL Server for Python.

https://datatofish.com/how-to-connect-python-to-sql-server-using-pyodbc/

Please remember, server-less pools provide read access to file in the data lake. These files look like tables in a database. No write options are supported with this service.

I was able to solve the problem by using Managed Identity credentials instead of Shared Access Signature. Ran the following code to create the credential and external table

Create a master key first.

CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<your_password_1>'

Now create a Managed Identity database scoped credential

CREATE DATABASE SCOPED CREDENTIAL <your_credential_name> WITH IDENTITY = 'Managed Identity'

Create a login user along with password. You will use it in your python script to run queries

CREATE LOGIN <your_login> WITH PASSWORD = '<your_login_password>'

The above login is to be used as UID in python script. Now create user

CREATE USER <your_user> FOR LOGIN <your_login>

GRANT REFERENCES ON DATABASE SCOPED CREDENTIAL::<your_credential_name> TO<your_user>

Now create an external data source with the above Credentials and use that data source to create an external table

IF NOT EXISTS (SELECT * FROM sys.external_file_formats WHERE name = '<FileFormatName>') 
CREATE EXTERNAL FILE FORMAT [<FileFormatName>] 
WITH ( FORMAT_TYPE = DELIMITEDTEXT ,
       FORMAT_OPTIONS (
         FIELD_TERMINATOR = ',',
         USE_TYPE_DEFAULT = FALSE
        ))
GO

IF NOT EXISTS (SELECT * FROM sys.external_data_sources WHERE name = '<ExternalDataSourceName>') 
CREATE EXTERNAL DATA SOURCE [<ExternalDataSourceName>] 
WITH (
    CREDENTIAL = <your_credential_name>,
    LOCATION = 'abfss://<container>@<storage-account>.dfs.core.windows.net' 
)
GO

CREATE EXTERNAL TABLE [<table_name>] (
[C1] nvarchar(4000),
[C2] nvarchar(4000),
[C3] bigint,
[C4] bigint
)
WITH (
LOCATION = <Location To File in Container>,
DATA_SOURCE = [<ExternalDataSourceName>],
FILE_FORMAT = [<FileFormatName>]
)
GO

Now, final step, grant Select access

GRANT SELECT ON OBJECT::[dbo].[<table_name>] TO [<your_user>]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM