简体   繁体   English

使用帐户密钥从 Synapse Notebook 写入 ADLS

[英]Writing to ADLS from Synapse Notebook with account key

I am trying to write a file from an Azure Synapse Notebook to ADLS Gen2 while authenticating with the account key.我正在尝试将文件从 Azure Synapse Notebook 写入 ADLS Gen2,同时使用帐户密钥进行身份验证。

When I use python and the DataLakeServiceClient , I can authenticate via key and write a file without a problem.当我使用 python 和DataLakeServiceClient时,我可以通过密钥进行身份验证并毫无问题地写入文件。 If I try to authenticate with the same key for Spark, I get java.nio.file.AccessDeniedException: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, PUT, .如果我尝试使用相同的 Spark 密钥进行身份验证,我会得到java.nio.file.AccessDeniedException: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, PUT,

With PySpark and authorization with the account key [NOT WORKING]:使用PySpark 并使用帐户密钥进行授权[NOT WORKING]:

myaccountname = ""
account_key = ""
spark.conf.set(f"fs.azure.account.key.{myaccountname}.dfs.core.windows.net", account_key)

dest_container = "container_name"
dest_storage_name = "storage_name"
destination_storage = f"abfss://{dest_container }@{dest_storage_name }.dfs.core.windows.net"

df.write.mode("append").parquet(destination_storage + "/raw/myfile.parquet")

But I can write a file with Python and the DataLakeServiceClient and also authorization with the account key [WORKING]:但是我可以使用 Python 和DataLakeServiceClient编写一个文件,还可以使用帐户密钥 [WORKING] 进行授权

from azure.storage.filedatalake import DataLakeServiceClient

# DAP ADLS configurations
storage_name = ""
account_key = ""
container_name = ""

service_client = DataLakeServiceClient(account_url=f"https://{storage_name}.dfs.core.windows.net", credential=account_key)
file_system_client = service_client.get_file_system_client(container_name)

dir_client = file_system_client.get_directory_client(directory_name)
dir_client.create_directory()
file_client = dir_client.get_file_client(file_name)
file_client.create_file()
file_client.append_data(file_content, offset=0, length=len(file_content))
file_client.flush_data(len(file_content))

What am I doing wrong?我究竟做错了什么? I was under the impression using spark.conf.set for a URL-key is enough?我的印象是使用spark.conf.set作为 URL 键就足够了吗?

--Update - 更新

Can you double check if you or the user running this has ADLSGen2 access and right permissions ( contributer role on subscription or Storage Blob Data Owner at the storage account level or Blob Storage Contributor Role to the service principal in the scope of the Data Lake Storage Gen2 storage account. ) depending on your setup.您能否仔细检查您或运行它的用户是否具有 ADLSGen2 访问权限和权限( contributer role on subscriptionStorage Blob Data Owner at the storage account levelBlob Storage Contributor Role to the service principal in the scope of the Data Lake Storage Gen2 storage account. )取决于您的设置。

Make sure you have the valid account key copied from the Azure portal.确保您拥有从 Azure 门户复制的有效帐户密钥

Just in case....以防万一....

To enable other users to use the storage account after you create your workspace, you will have to perform below tasks:要在您创建工作区后允许其他用户使用存储帐户,您必须执行以下任务:

  • Assign other users to the Contributor role on workspace将其他用户分配给工作区的参与者角色
  • Assign other users to a Workspace, SQL, or Spark admin role using Synapse Studio使用 Synapse Studio 将其他用户分配给 Workspace、SQL 或 Spark 管理员角色
  • Assign yourself and other users to the Storage Blob Data Contributor role on the storage account将您自己和其他用户分配给存储帐户上的存储 Blob 数据参与者角色

Also, if you are using MSI for synapse workspace, make sure the you as a user have same permission level in the notebook.此外,如果您将 MSI 用于突触工作区,请确保您作为用户在笔记本中具有相同的权限级别。


Going through the official MS docs on Azure Synapse connecting to Azure storage account浏览有关Azure Synapse 连接到 Azure 存储帐户的官方 MS 文档

In case you have set up an account key and secret for the storage account, you can set forwardSparkAzureStorageCredentials to true , in which case Azure Synapse connector automatically discovers the account access key set in the notebook session configuration or the global Hadoop configuration and forwards the storage account access key to the connected Azure Synapse instance by creating a temporary Azure database scoped credential . In case you have set up an account key and secret for the storage account, you can set forwardSparkAzureStorageCredentials to true , in which case Azure Synapse connector automatically discovers the account access key set in the notebook session configuration or the global Hadoop configuration and forwards the storage通过创建临时 Azure 数据库范围凭据来连接 Azure Synapse 实例的帐户访问密钥。

Just add this option while df.write只需在df.write时添加此选项

.option("forwardSparkAzureStorageCredentials", "true")

I finally solved it by using a LinkedService .我终于通过使用LinkedService解决了它。 In the LinkedService I used the AccountKey (retrieved from a KeyVault).在 LinkedService 中,我使用了 AccountKey(从 KeyVault 中检索)。

For some the direct reason the authentication with the account key in the code did not work in the Synapse Notebook, despite the User having all required permissions.由于某些直接原因,尽管用户拥有所有必需的权限,但代码中使用帐户密钥进行的身份验证在 Synapse Notebook 中不起作用。

UPDATE : According to Microsoft's third level tech support, authentication with an account key from within Synapse is not possible (!!!) You HAVE to use their LinkedServices.更新:根据 Microsoft 的第三级技术支持,无法使用 Synapse 中的帐户密钥进行身份验证(!!!)您必须使用他们的LinkedServices

If anyone else needs to authenticate:如果其他人需要验证:

linkedServiceName_var = "my_linked_service_name"
spark.conf.set("fs.azure.account.auth.type", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedSASProvider")
spark.conf.set("spark.storage.synapse.linkedServiceName", linkedServiceName_var)

raw_container_name = "my_container"
raw_storageaccount_name = "my_storage_account"
CONNECTION_STR = f"abfs://{raw_container_name}@{raw_storageaccount_name}.dfs.core.windows.net"


my_df = spark.read.parquet(CONNECTION_STR+ "/" + filepath)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM