简体   繁体   English

如何将 Azure Data Lake Store gen 2 文件共享与 Azure Databricks 连接?

[英]How to connect Azure Data Lake Store gen 2 File Share with Azure Databricks?

I have an Azure data lake storage gen 2 account, with hierarchical namespace enabled.我有一个 Azure 数据湖存储第 2 代帐户,启用了分层命名空间。 I generated a SAS-token to the account, and I recieve data to a folder in the File Share (File Service).我为该帐户生成了一个 SAS 令牌,并将数据接收到文件共享(文件服务)中的一个文件夹中。 Now I want to access these files through Azure Databricks and python.现在我想通过 Azure Databricks 和 python 访问这些文件。 However, it seems like Azure Databricks can only access the File System (called Blob Container in gen1), and not the File Share.但是,似乎 Azure Databricks 只能访问文件系统(在 gen1 中称为 Blob 容器),而不能访问文件共享。 I also failed to generate a SAS-token to the File System.我也未能为文件系统生成 SAS 令牌。

I wish to have a storage instance to which can generate a SAS-token and give to my client, and access the same from azure databricks using python.我希望有一个可以生成 SAS 令牌并提供给我的客户的存储实例,并使用 python 从 azure databricks 访问它。 It is not important if it is File System, File Share, ADLS gen2 or gen1 as long as it somehow works.只要它以某种方式工作,它是文件系统、文件共享、ADLS gen2 还是 gen1 并不重要。

I use the following to access the File System from databricks:我使用以下内容从数据块访问文件系统:

configs = {"fs.azure.account.auth.type": "OAuth",
           "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
           "fs.azure.account.oauth2.client.id": "my_client_id",
           "fs.azure.account.oauth2.client.secret": "my_client_secret",
           "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/"+"My_tenant_id" +"/oauth2/token",
           "fs.azure.createRemoteFileSystemDuringInitialization": "true"}

dbutils.fs.mount(source = "abfss://"+"my_file_system"+"@"+"my_storage_account"+".dfs.core.windows.net/MyFolder",
                 mount_point = "/mnt/my_mount",
                 extra_configs = configs) 

Works fine but I cannot make it access the File Share.工作正常,但我无法让它访问文件共享。 And I have a SAS-token with a connection string like this:我有一个带有连接字符串的 SAS 令牌,如下所示:

connection_string = (
    'BlobEndpoint=https://<my_storage>.blob.core.windows.net/;'+
    'QueueEndpoint=https://<my_storage>.queue.core.windows.net/;'+
    'FileEndpoint=https://<my_storage>.file.core.windows.net/;'+
    'TableEndpoint=https://<my_storage>.table.core.windows.net/;'+
    'SharedAccessSignature=sv=2018-03-28&ss=bfqt&srt=sco&sp=rwdlacup&se=2019-09-26T17:12:38Z&st=2019-08-26T09:12:38Z&spr=https&sig=<my_sig>'
)

Which I manage to use to upload stuff to the file share, but not to the file system.我设法使用它将内容上传到文件共享,但不能上传到文件系统。 Is there any kind of Azure storage that can be accessed by both a SAS-token and azure databricks?是否有任何类型的 Azure 存储可以通过 SAS 令牌和 azure 数据块访问?

Steps to connect to azure file share from databricks从数据块连接到 azure 文件共享的步骤

first install Microsoft Azure Storage File Share client library for Python using pip install in Databricks.首先使用 Databricks 中的 pip install 为 Python 安装 Microsoft Azure 存储文件共享客户端库。 https://pypi.org/project/azure-storage-file-share/ https://pypi.org/project/azure-storage-file-share/

after installing, create a storage account.安装后,创建一个存储帐户。 Then you can create a fileshare from databricks然后您可以从数据块创建文件共享

from azure.storage.fileshare import ShareClient

share = ShareClient.from_connection_string(conn_str="<connection_string consists of FileEndpoint=myFileEndpoint(https://storageaccountname.file.core.windows.net/);SharedAccessSignature=sasToken>", share_name="<file share name that you want to create>")

share.create_share()

use this for further reference https://docs.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string使用此作为进一步参考https://docs.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string

code to upload a file into fileshare through databricks通过数据块将文件上传到文件共享的代码

from azure.storage.fileshare import ShareFileClient
 
file_client = ShareFileClient.from_connection_string(conn_str="<connection_string consists of FileEndpoint=myFileEndpoint(https://storageaccountname.file.core.windows.net/);SharedAccessSignature=sasToken>", share_name="<your_fileshare_name>", file_path="my_file")
 
with open("./SampleSource.txt", "rb") as source_file:
    file_client.upload_file(source_file)

Refer this link for further information https://pypi.org/project/azure-storage-file-share/有关更多信息,请参阅此链接https://pypi.org/project/azure-storage-file-share/

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何通过 Azure Data Lake Store gen1 中的新文件触发 Azure Data Factory v2 或 Azure Databricks Notebook 中的管道 - How to trigger a pipeline in Azure Data Factory v2 or a Azure Databricks Notebook by a new file in Azure Data Lake Store gen1 Azure Databricks 将文件写入 Azure Data Lake Gen 2 - Azure Databricks writing a file into Azure Data Lake Gen 2 Azure Databricks:无法连接到 Azure Data Lake Storage Gen2 - Azure Databricks: can't connect to Azure Data Lake Storage Gen2 ML 组件在指向 Azure Data Lake Store Gen2 的 Azure Databricks (7.3.9) 中不起作用 - ML Components not working in Azure Databricks (7.3.9) pointing to Azure Data Lake Store Gen2 无法使用 Azure Databricks 挂载 Azure Data Lake Storage Gen 2 - Unable to mount Azure Data Lake Storage Gen 2 with Azure Databricks 如何从Azure Data Lake Store中读取Azure Databricks中的JSON文件 - How to read a JSON file in Azure Databricks from Azure Data Lake Store 将 Qlikview 连接到 Azure Data Lake Store Gen 1 - Connecting Qlikview to Azure Data Lake Store Gen 1 Azure Data Lake Store作为Databricks中的EXTERNAL TABLE - Azure Data Lake Store as EXTERNAL TABLE in Databricks 如何使用来自Azure文件共享的多个线程将数据复制到Azure Data Lake存储? - How to copy data to Azure Data Lake store using multiple threads from azure file share? 如何使用 databricks 在 Azure 数据湖中将.rdata 文件转换为镶木地板? - How to convert .rdata file to parquet in Azure data lake using databricks?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM