简体   繁体   中英

Azure Databricks: Accessing Blob Storage Behind Firewall

I am reading files on an Azure Blob Storage account (gen 2) from an Azure Databricks Notebook. Both services are in the same region (West Europe). Everything works fine, except when I add a firewall in front of the storage account. I have opted to allow "trusted Microsoft services":

Azure 门户存储帐户 - 防火墙

However, running the notebook now ends up with an access denied error:

com.microsoft.azure.storage.StorageException: This request is not authorized to perform this operation.

I tried to access the storage directly from Spark and by mounting it with dbutils , but same thing.

I would have assumed that Azure Databricks counts as a trusted Microsoft service? Furthermore I couldn't find solid information on IP ranges for Databricks regions that could be added to the firewall rules.

Yes, the Azure Databricks does not count as a trusted Microsoft service, you could see the supported trusted Microsoft services with the storage account firewall.

From networking, Here are two suggestions:

  1. Find the Azure datacenter IP address ( Original deprecated URL ) and scope a region where your Azure Databricks located. Whitelist the IP list in the storage account firewall.

  2. Deploy Azure Databricks in your Azure Virtual Network (Preview) then whitelist the VNet address range in the firewall of the storage account. You could refer to configure Azure Storage firewalls and virtual networks. Also, you have NSG to restrict inbound and outbound traffics from this Azure VNet. Note: you need to deploy Azure Databricks to your own VNet .

Hope this helps.

The described scenario only works if you deploy Azure Databricks in your own Azure Virtual Network (vnet). With this you are able to use Service Endpoints, so could add your Databricks vnet to the Blob Storage. With the default deployment this is not supported and not possible. See the following Documentation for more details and a description how to get the vnet-injection feature enabled.

Enabling the mentioned exception does not work, as Azure Databricks is not in the list of trusted Services for Blob Storage. See the following Documentation which services still can access the storage account with the exception enabled.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM