如何在 Databricks notebook 中获取运行参数和 runId？

Question

When running a Databricks notebook as a job, you can specify job or run parameters that can be used within the code of the notebook.将 Databricks 笔记本作为作业运行时，您可以指定可在笔记本代码中使用的作业或运行参数。 However, it wasn't clear from documentation how you actually fetch them.但是，文档中并不清楚您实际上是如何获取它们的。 I'd like to be able to get all the parameters as well as job id and run id.我希望能够获取所有参数以及作业 ID 和运行 ID。

Answer 1

Job/run parameters作业/运行参数

When the notebook is run as a job, then any job parameters can be fetched as a dictionary using the dbutils package that Databricks automatically provides and imports.当笔记本作为作业运行时，可以使用 Databricks 自动提供和导入的dbutils package 将任何作业参数作为字典获取。 Here's the code:这是代码：

run_parameters = dbutils.notebook.entry_point.getCurrentBindings()

If the job parameters were {"foo": "bar"} , then the result of the code above gives you the dict {'foo': 'bar'} .如果作业参数是{"foo": "bar"} ，那么上面代码的结果会为您提供 dict {'foo': 'bar'} 。 Note that Databricks only allows job parameter mappings of str to str , so keys and values will always be strings.请注意，Databricks 仅允许str到str的作业参数映射，因此键和值将始终是字符串。

Note that if the notebook is run interactively (not as a job), then the dict will be empty.请注意，如果笔记本以交互方式运行（而不是作为作业），则 dict 将为空。 The getCurrentBinding() method also appears to work for getting any active widget values for the notebook (when run interactively). getCurrentBinding()方法似乎也适用于获取笔记本的任何活动小部件值（以交互方式运行时）。

Getting the jobId and runId获取jobId和runId

To get the jobId and runId you can get a context json from dbutils that contains that information.要获取jobId和runId ，您可以从包含该信息的dbutils获取上下文 json。 (Adapted from databricks forum ): （改编自databricks论坛）：

import json
context_str = dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson()
context = json.loads(context_str)
run_id_obj = context.get('currentRunId', {})
run_id = run_id_obj.get('id', None) if run_id_obj else None
job_id = context.get('tags', {}).get('jobId', None)

So within the context object, the path of keys for runId is currentRunId > id and the path of keys to jobId is tags > jobId .所以在上下文 object 中， runId的键路径是currentRunId > id ，而jobId的键路径是tags > jobId 。

Answer 2

Nowadays you can easily get the parameters from a job through the widget API. This is pretty well described in the official documentation from Databricks.如今，您可以通过小部件 API 轻松地从作业中获取参数。这在 Databricks 的官方文档中有很好的描述。 Below, I'll elaborate on the steps you have to take to get there, it is fairly easy.下面，我将详细说明您到达那里必须采取的步骤，这很容易。

Create or use an existing notebook that has to accept some parameters.创建或使用必须接受某些参数的现有笔记本。 We want to know the job_id and run_id , and let's also add two user-defined parameters environment and animal .我们想知道job_id和run_id ，我们还要添加两个用户定义的参数environment和animal 。
```
 # Get parameters from job job_id = dbutils.widgets.get("job_id") run_id = dbutils.widgets.get("run_id") environment = dbutils.widgets.get("environment") animal = dbutils.widgets.get("animal") print(job_id) print(run_id) print(environment) print(animal)
```
Now let's go to Workflows > Jobs to create a parameterised job.现在让我们 go 到 Workflows > Jobs 来创建一个参数化的工作。 Make sure you select the correct notebook and specify the parameters for the job at the bottom.确保您 select 是正确的笔记本并在底部指定作业的参数。 According to the documentation , we need to use curly brackets for the parameter values of job_id and run_id .根据文档，我们需要为job_id和run_id的参数值使用大括号。 For the other parameters, we can pick a value ourselves.对于其他参数，我们可以自己选择一个值。

Note: The reason why you are not allowed to get the job_id and run_id directly from the notebook, is because of security reasons (as you can see from the stack trace when you try to access the attributes of the context).注意：不允许直接从笔记本获取job_id和run_id的原因是出于安全原因（正如您在尝试访问上下文属性时从堆栈跟踪中看到的那样）。 Within a notebook you are in a different context, those parameters live at a "higher" context.在笔记本中，您处于不同的上下文中，这些参数处于“更高”的上下文中。

Run the job and observe that it outputs something like:运行作业并观察它输出如下内容：
```
 dev squirrel 137355915119346 7492 Command took 0.09 seconds
```
You can even set default parameters in the notebook itself, that will be used if you run the notebook or if the notebook is triggered from a job without parameters.您甚至可以在笔记本本身中设置默认参数，如果您运行笔记本或者笔记本是从没有参数的作业中触发的，就会使用这些参数。 This makes testing easier, and allows you to default certain values.这使测试更容易，并允许您默认某些值。
```
 # Adding widgets to a notebook dbutils.widgets.text("environment", "tst") dbutils.widgets.text("animal", "turtle") # Removing widgets from a notebook dbutils.widgets.remove("environment") dbutils.widgets.remove("animal") # Or removing all widgets from a notebook dbutils.widgets.removeAll()
```

And last but not least, I tested this on different cluster types, so far I found no limitations.最后但同样重要的是，我在不同的集群类型上对此进行了测试，到目前为止我没有发现任何限制。 My current settings are:我当前的设置是：
```
 spark.databricks.cluster.profile serverless spark.databricks.passthrough.enabled true spark.databricks.pyspark.enableProcessIsolation true spark.databricks.repl.allowedLanguages python,sql
```

如何在 Databricks notebook 中获取运行参数和 runId？

问题描述

2 个解决方案

解决方案1
5 2020-07-21 16:26:22

Job/run parameters作业/运行参数

Getting the jobId and runId获取jobId和runId

解决方案2
1 2023-01-13 10:26:10

如何在 Databricks notebook 中获取运行参数和 runId？

问题描述

2 个解决方案

解决方案1 5 2020-07-21 16:26:22

Job/run parameters作业/运行参数

Getting the jobId and runId获取jobId和runId

解决方案2 1 2023-01-13 10:26:10

解决方案1
5 2020-07-21 16:26:22

解决方案2
1 2023-01-13 10:26:10