简体   繁体   English

在火花流工作中,如何从执行程序收集错误消息到驱动程序并在每个流批处理结束时记录这些错误消息?

[英]In spark streaming job, how to collect error messages from executors to drivers and log these at the end of each streaming batch?

I want to log all the error messages in the driver machine. 我想记录驱动程序计算机中的所有错误消息。 How to do this efficiently. 如何有效地做到这一点。

Basically you need a Custom Logger to write all your application specific logs to separate file instead of writing in YARN logs. 基本上,您需要一个自定义记录器将所有特定于应用程序的日志写入单独的文件,而不是写入YARN日志。

https://mapr.com/blog/how-log-apache-spark/ https://mapr.com/blog/how-log-apache-spark/

The above URL says clearly , how you can implement Custom logger mechanism in Apache Spark. 上面的URL清楚地说明了如何在Apache Spark中实现自定义记录器机制。

You can put a try/catch block around your code and print the stack-trace. 您可以在代码周围放置一个try / catch块并打印堆栈跟踪。

 try {

    // your code

  } catch {
    case e: Exception => {
      e.printStackTrace()
    }

Depending on the amount of messages to be collected, you could consider doing this using an Accumulator . 根据要收集的消息量,您可以考虑使用Accumulator执行此操作。 The built-in ListAccumulator seems to be a prime candidate. 内置的ListAccumulator似乎是一个主要的候选者。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM