简体   繁体   English

如何在 Databricks notebook 中获取运行参数和 runId?

[英]How do you get the run parameters and runId within Databricks notebook?

When running a Databricks notebook as a job, you can specify job or run parameters that can be used within the code of the notebook.将 Databricks 笔记本作为作业运行时,您可以指定可在笔记本代码中使用的作业或运行参数。 However, it wasn't clear from documentation how you actually fetch them.但是,文档中并不清楚您实际上是如何获取它们的。 I'd like to be able to get all the parameters as well as job id and run id.我希望能够获取所有参数以及作业 ID 和运行 ID。

Job/run parameters作业/运行参数

When the notebook is run as a job, then any job parameters can be fetched as a dictionary using the dbutils package that Databricks automatically provides and imports.当笔记本作为作业运行时,可以使用 Databricks 自动提供和导入的dbutils package 将任何作业参数作为字典获取。 Here's the code:这是代码:

run_parameters = dbutils.notebook.entry_point.getCurrentBindings()

If the job parameters were {"foo": "bar"} , then the result of the code above gives you the dict {'foo': 'bar'} .如果作业参数是{"foo": "bar"} ,那么上面代码的结果会为您提供 dict {'foo': 'bar'} Note that Databricks only allows job parameter mappings of str to str , so keys and values will always be strings.请注意,Databricks 仅允许strstr的作业参数映射,因此键和值将始终是字符串。

Note that if the notebook is run interactively (not as a job), then the dict will be empty.请注意,如果笔记本以交互方式运行(而不是作为作业),则 dict 将为空。 The getCurrentBinding() method also appears to work for getting any active widget values for the notebook (when run interactively). getCurrentBinding()方法似乎也适用于获取笔记本的任何活动小部件值(以交互方式运行时)。

Getting the jobId and runId获取jobId和runId

To get the jobId and runId you can get a context json from dbutils that contains that information.要获取jobIdrunId ,您可以从包含该信息的dbutils获取上下文 json。 (Adapted from databricks forum ): (改编自databricks论坛):

import json
context_str = dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson()
context = json.loads(context_str)
run_id_obj = context.get('currentRunId', {})
run_id = run_id_obj.get('id', None) if run_id_obj else None
job_id = context.get('tags', {}).get('jobId', None)

So within the context object, the path of keys for runId is currentRunId > id and the path of keys to jobId is tags > jobId .所以在上下文 object 中, runId的键路径是currentRunId > id ,而jobId的键路径是tags > jobId

Nowadays you can easily get the parameters from a job through the widget API. This is pretty well described in the official documentation from Databricks.如今,您可以通过小部件 API 轻松地从作业中获取参数。这在 Databricks 的官方文档中有很好的描述。 Below, I'll elaborate on the steps you have to take to get there, it is fairly easy.下面,我将详细说明您到达那里必须采取的步骤,这很容易。

  1. Create or use an existing notebook that has to accept some parameters.创建或使用必须接受某些参数的现有笔记本。 We want to know the job_id and run_id , and let's also add two user-defined parameters environment and animal .我们想知道job_idrun_id ,我们还要添加两个用户定义的参数environmentanimal

     # Get parameters from job job_id = dbutils.widgets.get("job_id") run_id = dbutils.widgets.get("run_id") environment = dbutils.widgets.get("environment") animal = dbutils.widgets.get("animal") print(job_id) print(run_id) print(environment) print(animal)
  2. Now let's go to Workflows > Jobs to create a parameterised job.现在让我们 go 到 Workflows > Jobs 来创建一个参数化的工作。 Make sure you select the correct notebook and specify the parameters for the job at the bottom.确保您 select 是正确的笔记本并在底部指定作业的参数。 According to the documentation , we need to use curly brackets for the parameter values of job_id and run_id .根据文档,我们需要为job_idrun_id的参数值使用大括号。 For the other parameters, we can pick a value ourselves.对于其他参数,我们可以自己选择一个值。

Note: The reason why you are not allowed to get the job_id and run_id directly from the notebook, is because of security reasons (as you can see from the stack trace when you try to access the attributes of the context).注意:不允许直接从笔记本获取job_idrun_id的原因是出于安全原因(正如您在尝试访问上下文属性时从堆栈跟踪中看到的那样)。 Within a notebook you are in a different context, those parameters live at a "higher" context.在笔记本中,您处于不同的上下文中,这些参数处于“更高”的上下文中。

  1. Run the job and observe that it outputs something like:运行作业并观察它输出如下内容:

     dev squirrel 137355915119346 7492 Command took 0.09 seconds
  2. You can even set default parameters in the notebook itself, that will be used if you run the notebook or if the notebook is triggered from a job without parameters.您甚至可以在笔记本本身中设置默认参数,如果您运行笔记本或者笔记本是从没有参数的作业中触发的,就会使用这些参数。 This makes testing easier, and allows you to default certain values.这使测试更容易,并允许您默认某些值。

     # Adding widgets to a notebook dbutils.widgets.text("environment", "tst") dbutils.widgets.text("animal", "turtle") # Removing widgets from a notebook dbutils.widgets.remove("environment") dbutils.widgets.remove("animal") # Or removing all widgets from a notebook dbutils.widgets.removeAll()
  1. And last but not least, I tested this on different cluster types, so far I found no limitations.最后但同样重要的是,我在不同的集群类型上对此进行了测试,到目前为止我没有发现任何限制。 My current settings are:我当前的设置是:

     spark.databricks.cluster.profile serverless spark.databricks.passthrough.enabled true spark.databricks.pyspark.enableProcessIsolation true spark.databricks.repl.allowedLanguages python,sql

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 Azure DataBricks 中获取 runID 或 processid? - How to get the runID or processid in Azure DataBricks? 您如何使用 Databricks 作业任务参数或笔记本变量来设置彼此的值? - How do you use either Databricks Job Task parameters or Notebook variables to set the value of each other? 你如何从笔记本或数据块上的命令行运行 pytest? - how do you run pytest either from a notebook or command line on databricks? 如何将参数发送到数据块笔记本? - how to send parameters to databricks notebook? 如何将与 Databricks 作业相关的所有参数运行到 python 中? - How to get all parameters related to a Databricks job run into python? 如何动态获取Databricks Notebook的路径? - How to get the path of the Databricks Notebook dynamically? 如何从 azure databricks 笔记本运行机器人框架脚本? - How to run a robot framework script from azure databricks notebook? 如何将脚本路径作为数据块笔记本中的变量传递给 %run magic 命令? - How to pass the script path to %run magic command as a variable in databricks notebook? 如何在 Databricks 中引用笔记本的路径/%run 在做什么? - How can I reference the path of a notebook in Databricks/what is %run doing? Databricks 如何通过 API 获取 Notebook Jobs 的 output? - Databricks how to get output of the Notebook Jobs via API?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM