[英]How to connect Azure Data Lake Store gen 2 File Share with Azure Databricks?
I have an Azure data lake storage gen 2 account, with hierarchical namespace enabled.我有一个 Azure 数据湖存储第 2 代帐户,启用了分层命名空间。 I generated a SAS-token to the account, and I recieve data to a folder in the File Share (File Service).
我为该帐户生成了一个 SAS 令牌,并将数据接收到文件共享(文件服务)中的一个文件夹中。 Now I want to access these files through Azure Databricks and python.
现在我想通过 Azure Databricks 和 python 访问这些文件。 However, it seems like Azure Databricks can only access the File System (called Blob Container in gen1), and not the File Share.
但是,似乎 Azure Databricks 只能访问文件系统(在 gen1 中称为 Blob 容器),而不能访问文件共享。 I also failed to generate a SAS-token to the File System.
我也未能为文件系统生成 SAS 令牌。
I wish to have a storage instance to which can generate a SAS-token and give to my client, and access the same from azure databricks using python.我希望有一个可以生成 SAS 令牌并提供给我的客户的存储实例,并使用 python 从 azure databricks 访问它。 It is not important if it is File System, File Share, ADLS gen2 or gen1 as long as it somehow works.
只要它以某种方式工作,它是文件系统、文件共享、ADLS gen2 还是 gen1 并不重要。
I use the following to access the File System from databricks:我使用以下内容从数据块访问文件系统:
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "my_client_id",
"fs.azure.account.oauth2.client.secret": "my_client_secret",
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/"+"My_tenant_id" +"/oauth2/token",
"fs.azure.createRemoteFileSystemDuringInitialization": "true"}
dbutils.fs.mount(source = "abfss://"+"my_file_system"+"@"+"my_storage_account"+".dfs.core.windows.net/MyFolder",
mount_point = "/mnt/my_mount",
extra_configs = configs)
Works fine but I cannot make it access the File Share.工作正常,但我无法让它访问文件共享。 And I have a SAS-token with a connection string like this:
我有一个带有连接字符串的 SAS 令牌,如下所示:
connection_string = (
'BlobEndpoint=https://<my_storage>.blob.core.windows.net/;'+
'QueueEndpoint=https://<my_storage>.queue.core.windows.net/;'+
'FileEndpoint=https://<my_storage>.file.core.windows.net/;'+
'TableEndpoint=https://<my_storage>.table.core.windows.net/;'+
'SharedAccessSignature=sv=2018-03-28&ss=bfqt&srt=sco&sp=rwdlacup&se=2019-09-26T17:12:38Z&st=2019-08-26T09:12:38Z&spr=https&sig=<my_sig>'
)
Which I manage to use to upload stuff to the file share, but not to the file system.我设法使用它将内容上传到文件共享,但不能上传到文件系统。 Is there any kind of Azure storage that can be accessed by both a SAS-token and azure databricks?
是否有任何类型的 Azure 存储可以通过 SAS 令牌和 azure 数据块访问?
Steps to connect to azure file share from databricks从数据块连接到 azure 文件共享的步骤
first install Microsoft Azure Storage File Share client library for Python using pip install in Databricks.首先使用 Databricks 中的 pip install 为 Python 安装 Microsoft Azure 存储文件共享客户端库。 https://pypi.org/project/azure-storage-file-share/
https://pypi.org/project/azure-storage-file-share/
after installing, create a storage account.安装后,创建一个存储帐户。 Then you can create a fileshare from databricks
然后您可以从数据块创建文件共享
from azure.storage.fileshare import ShareClient
share = ShareClient.from_connection_string(conn_str="<connection_string consists of FileEndpoint=myFileEndpoint(https://storageaccountname.file.core.windows.net/);SharedAccessSignature=sasToken>", share_name="<file share name that you want to create>")
share.create_share()
use this for further reference https://docs.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string使用此作为进一步参考https://docs.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string
code to upload a file into fileshare through databricks通过数据块将文件上传到文件共享的代码
from azure.storage.fileshare import ShareFileClient
file_client = ShareFileClient.from_connection_string(conn_str="<connection_string consists of FileEndpoint=myFileEndpoint(https://storageaccountname.file.core.windows.net/);SharedAccessSignature=sasToken>", share_name="<your_fileshare_name>", file_path="my_file")
with open("./SampleSource.txt", "rb") as source_file:
file_client.upload_file(source_file)
Refer this link for further information https://pypi.org/project/azure-storage-file-share/有关更多信息,请参阅此链接https://pypi.org/project/azure-storage-file-share/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.