简体   繁体   English

如何使用 Azure Active Directory (AAD) 从 Azure blob 将数据读入数据块笔记本

[英]How to read data into a databricks notebook from Azure blob using Azure Active Directory (AAD)

I am trying to read data from some containers into my notebook and write them into the format of spark or pandas dataframe.我正在尝试将一些容器中的数据读取到我的笔记本中,并将它们写入 spark 或 pandas dataframe 的格式。 There are some documentations about using account password, but how can I do it with Azure Active Directory?有一些关于使用帐户密码的文档,但是如何使用 Azure Active Directory 来做到这一点?

Unfortunately, these are the supported methods in Databricks for accessing Azure Blob Storage:不幸的是,这些是 Databricks 中支持的用于访问 Azure Blob 存储的方法:

  • Mount an Azure Blob storage container挂载一个 Azure Blob 存储容器
  • Access Azure Blob storage directly直接访问 Azure Blob 存储
  • Access Azure Blob storage using the RDD API使用 RDD API 访问 Azure Blob 存储

Reference: Databricks - Azure Blob Storage参考: Databricks - Azure Blob 存储

Hope this helps.希望这可以帮助。

There is several Azure offical documents about accessing Azure Blob using Azure AD, as below.关于使用 Azure AD 访问 Azure Blob 的 Azure 官方文档有几个,如下所示。

  1. Authorize access to Azure blobs and queues using Azure Active Directory 使用 Azure Active Directory 授权访问 Azure blob 和队列
  2. Authorize access to blobs and queues with Azure Active Directory from a client application 使用 Azure Active Directory 从客户端应用程序授权访问 blob 和队列
  3. Authorize with Azure Active Directory about Authorize requests to Azure Storage 使用 Azure Active Directory授权关于授权对 Azure 存储的请求

Meanwile, here is my sample code to get the key (account password) of an Azure Storage account for using it in databricks. Meanwile,这是我的示例代码,用于获取 Azure 存储帐户的密钥(帐户密码),以便在数据块中使用它。

from azure.common.credentials import ServicePrincipalCredentials
from azure.mgmt.storage import StorageManagementClient

# Please refer to the second document above to get these parameter values
credentials = ServicePrincipalCredentials(
    client_id='<your client id>',
    secret='<your client secret>',
    tenant='<your tenant id>'
)

subscription_id = '<your subscription id>'

client = StorageManagementClient(credentials, subscription_id)

resource_group_name = '<the resource group name of your storage account>'
account_name = '<your storage account name>'

# print(dir(client.storage_accounts))

keys_json_text = client.storage_accounts.list_keys(resource_group_name, account_name, raw=True).response.text

import json
keys_json = json.loads(keys_json_text)
# print(keys_json)
# {"keys":[{"keyName":"key1","value":"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx==","permissions":"FULL"},{"keyName":"key2","value":"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx==","permissions":"FULL"}]}'
key1 = keys_json['keys'][0]['value']
print(key1)

Then, you can use the account password above to follow the Azure Databricks offical document Data > Data Sources > Azure Blob Storage to read data.然后,可以使用上面的账号密码按照Azure Databricks官方文档Data > Data Sources > Azure Blob Storage读取数据。

Otherwise, you can refer to the Steps 1 & 2 of my answer for the other SO thread transform data in azure data factory using python data bricks to read data, as the code below.否则,您可以参考我的回答的步骤 1 和 2, 获取 azure 数据工厂中的其他 SO 线程转换数据,使用 python 数据块读取数据,如下面的代码。

from azure.storage.blob.baseblobservice import BaseBlobService
from azure.storage.blob import ContainerPermissions
from datetime import datetime, timedelta

account_name = '<your account name>'
account_key = '<your account key>' # the key comes from the code above
container_name = '<your container name>'

service = BaseBlobService(account_name=account_name, account_key=account_key)
token = service.generate_container_shared_access_signature(container_name, permission=ContainerPermissions.READ, expiry=datetime.utcnow() + timedelta(hours=1),)

blob_name = '<your blob name of dataset>'
blob_url_with_token = f"https://{account_name}.blob.core.windows.net/{container_name}/{blob_name}?{token}"

import pandas as pd

pdf = pd.read_json(blob_url_with_token)
df = spark.createDataFrame(pdf)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 Azure Databricks Notebook 直接读取 Azure Blob 存储文件 - How can I read an Azure Blob Storage file direclty from an Azure Databricks Notebook 使用 python notebook 从 Azure Blob 读取数据到内存中 - Read data from Azure Blob to In-memory using python notebook 使用/ mnt /将数据从Azure Blob存储读取到Azure Databricks - Reading data from Azure Blob Storage into Azure Databricks using /mnt/ 如何使用 SAS 读取 Azure 数据块中的 blob - How to read a blob in Azure databricks with SAS 如何仅使用组织中的 Azure Active Directory 访问 Azure blob 存储中的 Blob 数据 - How to aceess Blobs data in Azure blob storage using only for Azure Active directory in an organization 带有Azure blob和笔记本的flow_from_directory - flow_from_directory with Azure blob and notebook 如何使用Azure PowerShell SDK向Azure Active Directory(AAD)应用程序添加必需的权限? - How do I add required permissions to an Azure Active Directory (AAD) application using the Azure PowerShell SDK? 如何将 object 从 azure 数据工厂查找传递到 databricks 笔记本? - How to pass an object from an azure data factory lookup to a databricks notebook? 从 Azure Databricks 将数据写入 Azure Blob 存储 - Writing Data to Azure Blob Storage from Azure Databricks 如何从Azure中的Blob存储将数据上传到Jupyter Notebook? - How to upload data to Jupyter Notebook from blob storage in Azure?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM