简体   繁体   中英

How to connect Azure Data Lake Store gen 2 File Share with Azure Databricks?

I have an Azure data lake storage gen 2 account, with hierarchical namespace enabled. I generated a SAS-token to the account, and I recieve data to a folder in the File Share (File Service). Now I want to access these files through Azure Databricks and python. However, it seems like Azure Databricks can only access the File System (called Blob Container in gen1), and not the File Share. I also failed to generate a SAS-token to the File System.

I wish to have a storage instance to which can generate a SAS-token and give to my client, and access the same from azure databricks using python. It is not important if it is File System, File Share, ADLS gen2 or gen1 as long as it somehow works.

I use the following to access the File System from databricks:

configs = {"fs.azure.account.auth.type": "OAuth",
           "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
           "fs.azure.account.oauth2.client.id": "my_client_id",
           "fs.azure.account.oauth2.client.secret": "my_client_secret",
           "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/"+"My_tenant_id" +"/oauth2/token",
           "fs.azure.createRemoteFileSystemDuringInitialization": "true"}

dbutils.fs.mount(source = "abfss://"+"my_file_system"+"@"+"my_storage_account"+".dfs.core.windows.net/MyFolder",
                 mount_point = "/mnt/my_mount",
                 extra_configs = configs) 

Works fine but I cannot make it access the File Share. And I have a SAS-token with a connection string like this:

connection_string = (
    'BlobEndpoint=https://<my_storage>.blob.core.windows.net/;'+
    'QueueEndpoint=https://<my_storage>.queue.core.windows.net/;'+
    'FileEndpoint=https://<my_storage>.file.core.windows.net/;'+
    'TableEndpoint=https://<my_storage>.table.core.windows.net/;'+
    'SharedAccessSignature=sv=2018-03-28&ss=bfqt&srt=sco&sp=rwdlacup&se=2019-09-26T17:12:38Z&st=2019-08-26T09:12:38Z&spr=https&sig=<my_sig>'
)

Which I manage to use to upload stuff to the file share, but not to the file system. Is there any kind of Azure storage that can be accessed by both a SAS-token and azure databricks?

Steps to connect to azure file share from databricks

first install Microsoft Azure Storage File Share client library for Python using pip install in Databricks. https://pypi.org/project/azure-storage-file-share/

after installing, create a storage account. Then you can create a fileshare from databricks

from azure.storage.fileshare import ShareClient

share = ShareClient.from_connection_string(conn_str="<connection_string consists of FileEndpoint=myFileEndpoint(https://storageaccountname.file.core.windows.net/);SharedAccessSignature=sasToken>", share_name="<file share name that you want to create>")

share.create_share()

use this for further reference https://docs.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string

code to upload a file into fileshare through databricks

from azure.storage.fileshare import ShareFileClient
 
file_client = ShareFileClient.from_connection_string(conn_str="<connection_string consists of FileEndpoint=myFileEndpoint(https://storageaccountname.file.core.windows.net/);SharedAccessSignature=sasToken>", share_name="<your_fileshare_name>", file_path="my_file")
 
with open("./SampleSource.txt", "rb") as source_file:
    file_client.upload_file(source_file)

Refer this link for further information https://pypi.org/project/azure-storage-file-share/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM