在 Azure Databricks 和 Terraform 中安装带有 AAD 直通的 ADLS gen2

Question

I am trying to mount my ADLS gen2 storage containers into DBFS, with Azure Active Directory passthrough, using the Databricks Terraform provider.我正在尝试使用 Databricks Terraform 提供程序将我的 ADLS gen2 存储容器安装到 DBFS 中，使用 Azure Active Directory 直通。 I'm following the instructions here and here , but I'm getting the following error when Terraform attempts to deploy the mount resource:我正在按照此处和此处的说明进行操作，但是当 Terraform 尝试部署挂载资源时出现以下错误：

Error: Could not find ADLS Gen2 Token错误：找不到 ADLS Gen2 令牌

My Terraform code looks like the below (it's very similar to the example in the provider documentation) and I am deploying with an Azure Service Principal, which creates the Databricks workspace in the same module:我的 Terraform 代码如下所示（它与提供商文档中的示例非常相似），我正在部署 Azure 服务主体，它在同一模块中创建 Databricks 工作区：

provider "databricks" {
  host                        = azurerm_databricks_workspace.this.workspace_url
  azure_workspace_resource_id = azurerm_databricks_workspace.this.id
}

data "databricks_node_type" "smallest" {
  local_disk = true

  depends_on = [azurerm_databricks_workspace.this]
}

data "databricks_spark_version" "latest" {
  depends_on = [azurerm_databricks_workspace.this]
}

resource "databricks_cluster" "passthrough" {
  cluster_name            = "terraform-mount"
  spark_version           = data.databricks_spark_version.latest.id
  node_type_id            = data.databricks_node_type.smallest.id
  autotermination_minutes = 10
  num_workers             = 1

  spark_conf = {
    "spark.databricks.cluster.profile"                = "serverless",
    "spark.databricks.repl.allowedLanguages"          = "python,sql",
    "spark.databricks.passthrough.enabled"            = "true",
    "spark.databricks.pyspark.enableProcessIsolation" = "true"
  }

  custom_tags = {
    "ResourceClass" = "Serverless"
  }
}

resource "databricks_mount" "mount" {
  for_each = toset(var.storage_containers)

  name       = each.value
  cluster_id = databricks_cluster.passthrough.id
  uri        = "abfss://${each.value}@${var.sa_name}.dfs.core.windows.net"

  extra_configs = {
    "fs.azure.account.auth.type"                   = "CustomAccessToken",
    "fs.azure.account.custom.token.provider.class" = "{{sparkconf/spark.databricks.passthrough.adls.gen2.tokenProviderClassName}}",
  }

  depends_on = [
    azurerm_storage_container.data
  ]
}

(For clarity's sake, azurerm_storage_container.data is a set of storage containers with names from var.storage_containers , which are created in the azurerm_storage_account with name var.sa_name ; hence the URI.) （为清楚起见， azurerm_storage_container.data是一组名称来自var.storage_containers的存储容器，它们是在名称为var.sa_name的azurerm_storage_account中创建的；因此是 URI。）

I feel like this error is due to a fundamental misunderstanding on my part, rather than a simple omission.我觉得这个错误是由于我的根本误解，而不是简单的遗漏。 My underlying assumption is that I can mount storage containers for the workspace, with AAD passthrough, as a convenience when I deploy the infrastructure in its entirety.我的基本假设是，我可以使用 AAD 直通为工作区安装存储容器，作为我部署整个基础结构时的便利。 That is, whenever users come to use the workspace, any new passthrough cluster will be able to use these mounts with zero setup.也就是说，每当用户开始使用工作区时，任何新的直通集群都将能够以零设置使用这些挂载。

I can mount storage containers manually, following the AAD passthrough instructions: Spin up a high-concurrency cluster with passthrough enabled, then mount with dbutils.fs.mount .我可以按照 AAD 直通指令手动安装存储容器：启动一个启用直通的高并发集群，然后使用dbutils.fs.mount进行安装。 This is while logged in to the Databricks workspace with my user identity (rather than the Service Principal).这是在使用我的用户身份（而不是服务主体）登录到 Databricks 工作区时。 Is this the root of the problem;这是问题的根源吗？ is a Service Principal not appropriate for this task?服务委托人不适合这项任务吗？

(Interestingly, the Databricks runtime gives me exactly the same error if I try to access files on the manually created mount using a cluster without passthrough enabled.) （有趣的是，如果我尝试使用未启用直通的集群访问手动创建的装载上的文件，Databricks 运行时会给我完全相同的错误。）

Answer 1

Yes, that's problem arise from the use of service principal for that operation.是的，这是由于为该操作使用服务主体而产生的问题。 Azure docs for credentials passthrough says : Azure 凭据直通文档说：

You cannot use a cluster configured with ADLS credentials, for example, service principal credentials, with credential passthrough.您不能将配置有 ADLS 凭据（例如，服务主体凭据）的群集与凭据直通一起使用。

在 Azure Databricks 和 Terraform 中安装带有 AAD 直通的 ADLS gen2

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-02-17 12:43:07

在 Azure Databricks 和 Terraform 中安装带有 AAD 直通的 ADLS gen2

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-02-17 12:43:07

解决方案1
0 已采纳 2022-02-17 12:43:07