简体   繁体   中英

Databricks fails accessing a Data Lake Gen1 while trying to enumerate a directory

I am using (well... trying to use) Azure Databricks and I have created a notebook.

I would like the notebook to connect my Azure Data Lake (Gen1) and transform the data. I followed the documentation and put the code in the first cell of my notebook:

spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("dfs.adls.oauth2.client.id", "**using the application ID of the registered application**")
spark.conf.set("dfs.adls.oauth2.credential", "**using one of the registered application keys**")
spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/**using my-tenant-id**/oauth2/token")

dbutils.fs.ls("adl://**using my data lake uri**.azuredatalakestore.net/tenantdata/events")

The execution fails with this error:

com.microsoft.azure.datalake.store.ADLException: Error enumerating directory /

Operation null failed with exception java.io.IOException : Server returned HTTP response code: 400 for URL: https://login.microsoftonline.com/ using my-tenant-id /oauth2/token Last encountered exception thrown after 5 tries.

[java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException] [ServerRequestId:null] at com.microsoft.azure.datalake.store.ADLStoreClient.getExceptionFromResponse(ADLStoreClient.java:1169) at com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectoryInternal(ADLStoreClient.java:558) at com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:534) at com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:398) at com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:384)

I have given the registered application the Reader role to the Data Lake:

在此处输入图片说明

Question

How can I allow Spark to access the Data Lake?

Update

I have granted both the tenantdata and events folders Read and Execute access:

授予文件夹权限

The RBAC roles on the Gen1 lake do not grant access to the data (just the resource itself), with exception of the Owner role which grants Super User access and does grant full data access.

You must grant access to the folders/files themselves using Data Explorer in the Portal or download storage explorer using POSIX permissions.

This guide explains the detail of how to do that: https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control

Reference: https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-secure-data

Only the Owner role automatically enables file system access. The Contributor, Reader, and all other roles require ACLs to enable any level of access to folders and files

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM