简体   繁体   English

在 Databricks / Spark 中记录附加的集群信息

[英]Logging Attached Cluster Information in Databricks / Spark

I would like to do some performance testing on Databricks.我想在 Databricks 上做一些性能测试。 To do this I would like to log what cluster (VM type eg Standard_DS3_v2) I was using during the test (we can assume that the driver and worker nodes are the same).为此,我想记录我在测试期间使用的集群(VM 类型,例如 Standard_DS3_v2)(我们可以假设驱动程序和工作程序节点相同)。 I know I could log the no of workers, no of cores (on the driver at least) and the memory (on the driver at least).我知道我可以记录工人数、内核数(至少在驱动程序上)和 memory(至少在驱动程序上)。 However, I would like to know the VM type since I want to be able to identify if I used eg a storage optimized or general purpose cluster.但是,我想知道 VM 类型,因为我希望能够确定我使用的是存储优化集群还是通用集群。 Instead of the VM Type this information would also be fine.此信息也可以代替 VM 类型。 Optimally, I can get this information as a string in a variable within the notebook to later write it into my log file from there with other information I am logging.最理想的是,我可以将此信息作为笔记本中变量中的字符串获取,以便稍后将其与我正在记录的其他信息一起写入我的日志文件。 However, I am also happy with any hacky workaround if there is no straight forward solution to this.但是,如果没有直接的解决方案,我也对任何 hacky 解决方法感到满意。

You can get this information from the REST API, via GET request to Clusters API .您可以从 REST API 获取此信息,通过GET 请求到集群 API You can use notebook context to identify the cluster where the notebook is running via dbutils.notebook.getContext call that returns a map of different attributes, including the cluster ID, workspace domain name, and you can extract the authentication token from it.您可以使用笔记本上下文通过dbutils.notebook.getContext调用识别运行笔记本的集群,该调用返回 map 的不同属性,包括集群 ID、工作区域名,您可以从中提取身份验证令牌。 Here is the code that prints driver & worker node types (it's in Python, but Scala code should be familiar - I often use Scala's dbutils.notebook.getContext.tags to find what tags are available):这是打印驱动程序和工作节点类型的代码(它在 Python 中,但是 Scala 代码应该很熟悉——我经常使用 Scala 的dbutils.notebook.getContext.tags来查找可用的标签):

import requests
ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext()
host_name = ctx.tags().get("browserHostName").get()
host_token = ctx.apiToken().get()
cluster_id = ctx.tags().get("clusterId").get()

response = requests.get(
    f'https://{host_name}/api/2.0/clusters/get?cluster_id={cluster_id}',
    headers={'Authorization': f'Bearer {host_token}'}
  ).json()
print(f"driver type={response['driver_node_type_id']} worker type={response['node_type_id']}")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM