简体   繁体   English

在 Azure Databricks 和 Terraform 中安装带有 AAD 直通的 ADLS gen2

[英]Mounting ADLS gen2 with AAD passthrough in Azure Databricks with Terraform

I am trying to mount my ADLS gen2 storage containers into DBFS, with Azure Active Directory passthrough, using the Databricks Terraform provider.我正在尝试使用 Databricks Terraform 提供程序将我的 ADLS gen2 存储容器安装到 DBFS 中,使用 Azure Active Directory 直通。 I'm following the instructions here and here , but I'm getting the following error when Terraform attempts to deploy the mount resource:我正在按照此处此处的说明进行操作,但是当 Terraform 尝试部署挂载资源时出现以下错误:

Error: Could not find ADLS Gen2 Token错误:找不到 ADLS Gen2 令牌

My Terraform code looks like the below (it's very similar to the example in the provider documentation) and I am deploying with an Azure Service Principal, which creates the Databricks workspace in the same module:我的 Terraform 代码如下所示(它与提供商文档中的示例非常相似),我正在部署 Azure 服务主体,它在同一模块中创建 Databricks 工作区:

provider "databricks" {
  host                        = azurerm_databricks_workspace.this.workspace_url
  azure_workspace_resource_id = azurerm_databricks_workspace.this.id
}

data "databricks_node_type" "smallest" {
  local_disk = true

  depends_on = [azurerm_databricks_workspace.this]
}

data "databricks_spark_version" "latest" {
  depends_on = [azurerm_databricks_workspace.this]
}

resource "databricks_cluster" "passthrough" {
  cluster_name            = "terraform-mount"
  spark_version           = data.databricks_spark_version.latest.id
  node_type_id            = data.databricks_node_type.smallest.id
  autotermination_minutes = 10
  num_workers             = 1

  spark_conf = {
    "spark.databricks.cluster.profile"                = "serverless",
    "spark.databricks.repl.allowedLanguages"          = "python,sql",
    "spark.databricks.passthrough.enabled"            = "true",
    "spark.databricks.pyspark.enableProcessIsolation" = "true"
  }

  custom_tags = {
    "ResourceClass" = "Serverless"
  }
}

resource "databricks_mount" "mount" {
  for_each = toset(var.storage_containers)

  name       = each.value
  cluster_id = databricks_cluster.passthrough.id
  uri        = "abfss://${each.value}@${var.sa_name}.dfs.core.windows.net"

  extra_configs = {
    "fs.azure.account.auth.type"                   = "CustomAccessToken",
    "fs.azure.account.custom.token.provider.class" = "{{sparkconf/spark.databricks.passthrough.adls.gen2.tokenProviderClassName}}",
  }

  depends_on = [
    azurerm_storage_container.data
  ]
}

(For clarity's sake, azurerm_storage_container.data is a set of storage containers with names from var.storage_containers , which are created in the azurerm_storage_account with name var.sa_name ; hence the URI.) (为清楚起见, azurerm_storage_container.data是一组名称来自var.storage_containers的存储容器,它们是在名称为var.sa_nameazurerm_storage_account中创建的;因此是 URI。)

I feel like this error is due to a fundamental misunderstanding on my part, rather than a simple omission.我觉得这个错误是由于我的根本误解,而不是简单的遗漏。 My underlying assumption is that I can mount storage containers for the workspace, with AAD passthrough, as a convenience when I deploy the infrastructure in its entirety.我的基本假设是,我可以使用 AAD 直通为工作区安装存储容器,作为我部署整个基础结构时的便利。 That is, whenever users come to use the workspace, any new passthrough cluster will be able to use these mounts with zero setup.也就是说,每当用户开始使用工作区时,任何新的直通集群都将能够以零设置使用这些挂载。

I can mount storage containers manually, following the AAD passthrough instructions: Spin up a high-concurrency cluster with passthrough enabled, then mount with dbutils.fs.mount .我可以按照 AAD 直通指令手动安装存储容器:启动一个启用直通的高并发集群,然后使用dbutils.fs.mount进行安装。 This is while logged in to the Databricks workspace with my user identity (rather than the Service Principal).这是在使用我的用户身份(而不是服务主体)登录到 Databricks 工作区时。 Is this the root of the problem;这是问题的根源吗? is a Service Principal not appropriate for this task?服务委托人不适合这项任务吗?

(Interestingly, the Databricks runtime gives me exactly the same error if I try to access files on the manually created mount using a cluster without passthrough enabled.) (有趣的是,如果我尝试使用未启用直通的集群访问手动创建的装载上的文件,Databricks 运行时会给我完全相同的错误。)

Yes, that's problem arise from the use of service principal for that operation.是的,这是由于为该操作使用服务主体而产生的问题。 Azure docs for credentials passthrough says : Azure 凭据直通文档说

You cannot use a cluster configured with ADLS credentials, for example, service principal credentials, with credential passthrough.您不能将配置有 ADLS 凭据(例如,服务主体凭据)的群集与凭据直通一起使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 Databricks 上使用 Pyspark 访问 Azure ADLS gen2 - Accessing Azure ADLS gen2 with Pyspark on Databricks 使用 Databricks /mnt 安装 Azure Data lake Gen2 - Mounting Azure Data lake Gen2 with Databricks /mnt 备份 ADLS gen2 - Backup ADLS gen2 如何使用 Terraform 批准 Blob 存储 ADLS Gen2 上的托管专用端点? - How to use Terraform to approve a Managed Private Endpoint on a Blob Storage ADLS Gen2? ADLS Gen2 --> 文件夹级别的 ACL - ADLS Gen2 --> ACL on a folder level PowerBI 到带防火墙问题的 ADLS Gen2 - PowerBI to ADLS Gen2 With Firewall issue SAS 令牌使用 Azure java 目录级别的 ADLS Gen2 AD 服务原则 - SAS token using Azure AD Service Principle for ADLS Gen2 at directory level in java 从 ADLS gen2 上的增量表流式传输时出现 AzureBlobFileSystem FileNotFoundException - AzureBlobFileSystem FileNotFoundException when streaming from a Delta table on ADLS gen2 发送 Azure Iot 数据到 azure gen2 - Send Azure Iot data to azure gen2 即使在增量优化后,ADLS Gen2 位置中的小文件也可用 - Small files available in ADLS Gen2 location even after delta optimization
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM