简体   繁体   中英

Writing to ADLS from Synapse Notebook with account key

I am trying to write a file from an Azure Synapse Notebook to ADLS Gen2 while authenticating with the account key.

When I use python and the DataLakeServiceClient , I can authenticate via key and write a file without a problem. If I try to authenticate with the same key for Spark, I get java.nio.file.AccessDeniedException: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, PUT, .

With PySpark and authorization with the account key [NOT WORKING]:

myaccountname = ""
account_key = ""
spark.conf.set(f"fs.azure.account.key.{myaccountname}.dfs.core.windows.net", account_key)

dest_container = "container_name"
dest_storage_name = "storage_name"
destination_storage = f"abfss://{dest_container }@{dest_storage_name }.dfs.core.windows.net"

df.write.mode("append").parquet(destination_storage + "/raw/myfile.parquet")

But I can write a file with Python and the DataLakeServiceClient and also authorization with the account key [WORKING]:

from azure.storage.filedatalake import DataLakeServiceClient

# DAP ADLS configurations
storage_name = ""
account_key = ""
container_name = ""

service_client = DataLakeServiceClient(account_url=f"https://{storage_name}.dfs.core.windows.net", credential=account_key)
file_system_client = service_client.get_file_system_client(container_name)

dir_client = file_system_client.get_directory_client(directory_name)
dir_client.create_directory()
file_client = dir_client.get_file_client(file_name)
file_client.create_file()
file_client.append_data(file_content, offset=0, length=len(file_content))
file_client.flush_data(len(file_content))

What am I doing wrong? I was under the impression using spark.conf.set for a URL-key is enough?

--Update

Can you double check if you or the user running this has ADLSGen2 access and right permissions ( contributer role on subscription or Storage Blob Data Owner at the storage account level or Blob Storage Contributor Role to the service principal in the scope of the Data Lake Storage Gen2 storage account. ) depending on your setup.

Make sure you have the valid account key copied from the Azure portal.

Just in case....

To enable other users to use the storage account after you create your workspace, you will have to perform below tasks:

  • Assign other users to the Contributor role on workspace
  • Assign other users to a Workspace, SQL, or Spark admin role using Synapse Studio
  • Assign yourself and other users to the Storage Blob Data Contributor role on the storage account

Also, if you are using MSI for synapse workspace, make sure the you as a user have same permission level in the notebook.


Going through the official MS docs on Azure Synapse connecting to Azure storage account

In case you have set up an account key and secret for the storage account, you can set forwardSparkAzureStorageCredentials to true , in which case Azure Synapse connector automatically discovers the account access key set in the notebook session configuration or the global Hadoop configuration and forwards the storage account access key to the connected Azure Synapse instance by creating a temporary Azure database scoped credential .

Just add this option while df.write

.option("forwardSparkAzureStorageCredentials", "true")

I finally solved it by using a LinkedService . In the LinkedService I used the AccountKey (retrieved from a KeyVault).

For some the direct reason the authentication with the account key in the code did not work in the Synapse Notebook, despite the User having all required permissions.

UPDATE : According to Microsoft's third level tech support, authentication with an account key from within Synapse is not possible (!!!) You HAVE to use their LinkedServices.

If anyone else needs to authenticate:

linkedServiceName_var = "my_linked_service_name"
spark.conf.set("fs.azure.account.auth.type", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedSASProvider")
spark.conf.set("spark.storage.synapse.linkedServiceName", linkedServiceName_var)

raw_container_name = "my_container"
raw_storageaccount_name = "my_storage_account"
CONNECTION_STR = f"abfs://{raw_container_name}@{raw_storageaccount_name}.dfs.core.windows.net"


my_df = spark.read.parquet(CONNECTION_STR+ "/" + filepath)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM