简体   繁体   English

用于在 adls 上挂载 dbfs 的 databricks 初始化脚本

[英]databricks init script to mount dbfs on adls

I am using a python notebook to mount dbfs on adls, now I want to add this to the init scrip so this can be done during the job cluster start我正在使用 python 笔记本在 adls 上安装 dbfs,现在我想将其添加到 init 脚本中,以便可以在作业集群启动期间完成

this is the python code I am using how to make this run as the init script please:这是我正在使用的 python 代码如何使它作为初始化脚本运行:

environment = "development"
scopeCredentials = "test-" + environment

# Secrets
# ADLS
app_id = dbutils.secrets.get(scope=scopeCredentials, key="app_id")
key = dbutils.secrets.get(scope=scopeCredentials, key="key")
adls_name = dbutils.secrets.get(scope=scopeCredentials, key="adls-name")

# Configs
# ADLS
adls_configs = {
  "dfs.adls.oauth2.access.token.provider.type": "ClientCredential",
  "dfs.adls.oauth2.client.id": app_id, #id is the AppId of the service principal
  "dfs.adls.oauth2.credential": key,
  "dfs.adls.oauth2.refresh.url": "url"
}

mount_point="mount_point"
if any(mount.mountPoint == mount_point  for mount in dbutils.fs.mounts()):
  print("Storage: " + mount_point + " already mounted")
else:
  try:
    dbutils.fs.mount(
      source = "source",
      mount_point = "mount_point",
      extra_configs = adls_configs)
    print("Storage: " + mount_point + " successfully mounted")
  except:
    print("Storage: " + mount_point + " not mounted")
    pass

any idea how to change this to make it as a bash init script?知道如何更改它以使其成为 bash 初始化脚本吗?

Mounting of the storage needs to be done once, or when you change credentials of the service principal.存储的安装需要完成一次,或者在您更改服务主体的凭据时完成。 Unmount & mount during the execution may lead to a problems when somebody else is using that mount from another cluster.当其他人正在使用来自另一个集群的挂载时,在执行期间卸载和挂载可能会导致问题。

If you really want to access storage only from that cluster, then you need to configure that properties in the cluster's Spark Conf, and access data directly using abfss://... URIs (see docs for details).如果您真的只想从该集群访问存储,那么您需要在集群的 Spark Conf 中配置该属性,并使用abfss://... URI 直接访问数据(有关详细信息,请参阅文档)。 Mounting the storage just for time of execution of the cluster doesn't make sense from the security perspective, because during that time, anyone in workspace can access mounted data, as mount is global, not local to a cluster.从安全角度来看,仅在集群执行时挂载存储没有意义,因为在此期间,工作区中的任何人都可以访问挂载的数据,因为挂载是全局的,而不是集群本地的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM