简体   繁体   English

使用 Azure Databricks 从 ADLS 访问数据

[英]Access data from ADLS using Azure Databricks

I am trying to access data files stored in ADLS location via Azure Databricks using storage account access keys.我正在尝试使用存储帐户访问密钥通过 Azure Databricks 访问存储在 ADLS 位置的数据文件。 To access data files, I am using python notebook in azure databricks and below command works fine,要访问数据文件,我在 azure databricks 和以下命令中使用 python 笔记本工作正常,

spark.conf.set(
  "fs.azure.account.key.<storage-account-name>.dfs.core.windows.net",
  "<access-key>"
)

However, when I try to list the directory using below command, it throws an error但是,当我尝试使用以下命令列出目录时,会引发错误

dbutils.fs.ls("abfss://<container-name>@<storage-account-name>.dfs.core.windows.net")

ERROR:错误:

ExecutionError: An error occurred while calling z:com.databricks.backend.daemon.dbutils.FSUtils.ls.
: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, GET, https://<storage-account-name>.dfs.core.windows.net/<container-name>?upn=false&resource=filesystem&maxResults=500&timeout=90&recursive=false, AuthorizationPermissionMismatch, "This request is not authorized to perform this operation using this permission. RequestId:<request-id> Time:2021-08-03T08:53:28.0215432Z"

I am not sure on what permission would it require and how can I proceed with it.我不确定它需要什么许可以及如何继续。

Also, I am using ADLS Gen2 and Azure Databricks(Trial - premium).此外,我正在使用 ADLS Gen2 和 Azure Databricks(试用版 - 高级版)。

Thanks in advance!提前致谢!

The complete config key is called "spark.hadoop.fs.azure.account.key.adlsiqdigital.dfs.core.windows.net"完整的配置密钥称为“spark.hadoop.fs.azure.account.key.adlsiqdigital.dfs.core.windows.net”

However it would be beneficial for a production environment to use a service account and a mount point.但是,使用服务帐户和挂载点对生产环境来说是有益的。 This way the actions on the storage can be traced back to this application more easily than just with the generic access key and the mount point avoid specifying the connection string everywhere in your code.这样,存储上的操作可以更容易地追溯到这个应用程序,而不仅仅是使用通用访问密钥和挂载点,避免在代码中的任何地方指定连接字符串。

Try this out.试试这个。

spark.conf.set("fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net","<your-storage-account-access-key>")
dbutils.fs.mount(source = "abfss://<container-name>@<your-storage-account-name>.dfs.core.windows.net/", mount_point = "/mnt/test")

You can mount ADLS storage account using access key via Databricks and then read/write data.您可以通过 Databricks 使用访问密钥挂载 ADLS 存储帐户,然后读取/写入数据。 Please try below code:请尝试以下代码:

dbutils.fs.mount(
  source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
  mount_point = "/mnt/<mount-name>",
  extra_configs = {"fs.azure.account.key.<storage-account-name>.blob.core.windows.net":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})

dbutils.fs.ls("/mnt/<mount-name>")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Databricks 将 ML Model 从 ADLS 导入 Azure ML - Import ML Model from ADLS to Azure ML using Databricks Azure 具有 ADLS Gen 2 安全访问的 Databricks - Azure Databricks with ADLS Gen 2 Secure Access 使用 CLI 从 Hadoop On-Prem 访问 Azure ADLS Gen 2 - Access Azure ADLS Gen 2 from Hadoop On-Prem using CLI 如何使用托管身份连接 Azure Databricks 和 ADLS Gen 2? - How to connect Azure Databricks and ADLS Gen 2 using Managed Identity? 无法使用 Azure Databricks 访问已安装的 Azure 数据湖存储 - Unable to access a mounted Azure Data Lake storage using Azure Databricks 将数据从 ADLS Gen 2 加载到 Azure Synapse - Loading data from ADLS Gen 2 into Azure Synapse 在 Databricks 上使用 Pyspark 访问 Azure ADLS gen2 - Accessing Azure ADLS gen2 with Pyspark on Databricks Databricks + ADF + ADLS2 + Hive = Azure Synapse - Databricks + ADF + ADLS2 + Hive = Azure Synapse 使用/ mnt /将数据从Azure Blob存储读取到Azure Databricks - Reading data from Azure Blob Storage into Azure Databricks using /mnt/ 我们可以使用 Azure 数据工厂从 ADLS 中的 csv 文件中获取单行数据吗 - Can we fetch a single row data from csv file in ADLS using Azure data factory
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM