简体   繁体   English

使用 prometheus 进行 Pyspark UDF 监控

[英]Pyspark UDF monitoring with prometheus

I am am trying to monitor some logic in a udf using counters.我正在尝试使用计数器监视 udf 中的一些逻辑。

ie IE

counter = Counter(...).labels("value")

@ufd
def do_smthng(col):
  if col:
    counter.label("not_null").inc()
  else:
    counter.label("null").inc()
  return col

This is not the real case, but you should get the idea.这不是真实的情况,但你应该明白。 I have followed this article: https://kb.databricks.com/metrics/spark-metrics.html我关注了这篇文章: https ://kb.databricks.com/metrics/spark-metrics.html

I have so far tried:到目前为止,我已经尝试过:

  • Using a global prometheus counter (Failed with Lock is not picklable)使用全局普罗米修斯计数器(Failed with Lock 是不可挑选的)
  • Creating a custom source using py4j:使用 py4j 创建自定义源:

# noinspection PyPep8Naming
class CustomMetrics:
    def __init__(self, sourceName, metricRegistry):
        self.metricRegistry = metricRegistry
        self.sourceName = sourceName

    class Java:
        implements = ["org.apache.spark.metrics.source.Source"]

py_4j_gateway = spark_session.sparkContext._gateway
metric_registry = py_4j_gateway.jvm.com.codahale.metrics.MetricRegistry()
SparkEnv = py_4j_gateway.jvm.org.apache.spark.SparkEnv
custom_metrics_provider = CustomMetrics("spark.ingest.custom", metric_registry)

Which failed with the same error.失败并出现同样的错误。 I also can't get SparkEnv.get.metricsSystem so I can't register the custom metrics client in any case.我也无法获取SparkEnv.get.metricsSystem ,因此无论如何我都无法注册自定义指标客户端。

Is there no way for me to access the internal metric registry from python?我有没有办法从 python 访问内部指标注册表? I am starting to wonder how people do monitor spark pipelines with custom metrics.我开始想知道人们如何使用自定义指标监控火花管道。

Spark 3.1.2 Python 3.8 x86 MackBook Pro M1 Pro Spark 3.1.2 Python 3.8 x86 MackBook Pro M1 Pro

Why don't you use a accumulator ?为什么不用蓄能器 It's made to be accessible and is perfect for counting things.它易于访问,非常适合计算事物。 It's a hold over from Map Reduce that was used for collecting metrics before spark was invented.这是 Map Reduce 的一个保留,用于在 spark 发明之前收集指标。

Your accumulator can then be exposed as a sink via a 'PrometheusServlet'然后可以通过“PrometheusServlet”将您的累加器公开为接收器

namespace=AccumulatorSource note: User-configurable sources to attach accumulators to metric system DoubleAccumulatorSource LongAccumulatorSource namespace=AccumulatorSource 注意:用户可配置的源将累加器附加到公制 DoubleAccumulatorSource LongAccumulatorSource

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM