简体   繁体   English

Databricks 集群中的实时集群日志传递

[英]Real Time Cluster Log Delivery in a Databricks Cluster

I have some Python code that I am running on a Databricks Job Cluster.我有一些在 Databricks 作业集群上运行的 Python 代码。 My Python code will be generating a whole bunch of logs and I want to be able to monitor these logs in real time (or near real time), say through something like a dashboard.我的 Python 代码将生成一大堆日志,我希望能够实时(或接近实时)监控这些日志,比如通过仪表板之类的东西。

What I have done so far is, I have configured my cluster log delivery location and my logs are delivered to the specified destination every 5 minutes.到目前为止我所做的是,我已经配置了我的集群日志传送位置,并且我的日志每 5 分钟传送到指定的目的地。

This is explained here,这在这里解释,
https://docs.microsoft.com/en-us/azure/databricks/clusters/configure https://docs.microsoft.com/en-us/azure/databricks/clusters/configure

Here is an extract from the same article,这是同一篇文章的摘录,

When you create a cluster, you can specify a location to deliver the logs for the Spark driver node, worker nodes, and events.创建集群时,您可以指定一个位置来传递 Spark 驱动程序节点、工作程序节点和事件的日志。 Logs are delivered every five minutes to your chosen destination.日志每五分钟发送一次到您选择的目的地。 When a cluster is terminated, Azure Databricks guarantees to deliver all logs generated up until the cluster was terminated.当集群终止时,Azure Databricks 保证交付在集群终止之前生成的所有日志。

Is there some way I can have these logs delivered somewhere in near real time, rather than every 5 minutes?有什么方法可以让这些日志近乎实时地传送到某个地方,而不是每 5 分钟一次? It does not have to be through the same method either, I am open to other possibilities.它也不必通过相同的方法,我对其他可能性持开放态度。

As shown in below screenshot by default it is 5 minutes.如下图所示,默认为 5 分钟。 Unfortunately, it cannot be changed.不幸的是,它无法更改。 There is no information given in official documentation . 官方文档中没有给出任何信息。

在此处输入图像描述

However, you can raise feature request here但是,您可以在此处提出功能请求

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM