简体   繁体   English

如果过去 1 小时内没有推送记录,如何设置 Spark 流作业的警报?

[英]How can I set alert of spark streaming job if no records are being pushed in last 1 hour?

I have a spark streaming job that reads and processes data from the solace queue.我有一个 Spark 流作业,它从 solace 队列中读取和处理数据。 I want to set an alert on it if no data is consumed in last one hour.如果在过去一小时内没有消耗任何数据,我想对其设置警报。 Currently, I have set a batch window as 1 minute.目前,我已将批处理窗口设置为 1 分钟。 How can add an alert if no data is consumed continuously for an hour so that source can be notified?如果一个小时没有连续消耗数据,如何添加警报以便通知源?

在此处输入图片说明

You can keep track of it by saving the timestamp of the last received record in a hdfs file.您可以通过将最后接收到的记录的时间戳保存在 hdfs 文件中来跟踪它。 And then in while processing micro-batch, if rdd is empty and the difference in current timestamp and timestamp in hdfs is more than an hour you can send a mail using your mailing service.然后在处理微批处理时,如果 rdd 为空并且当前时间戳和 hdfs 中的时间戳差异超过一个小时,您可以使用您的邮件服务发送邮件。 If you receive some records in your micro batch you can update the timestamp in hdfs file accordingly.如果您在微批次中收到一些记录,您可以相应地更新 hdfs 文件中的时间戳。 Your code will look something like below where you need to implement getTimeStampFromHDFS() which will return timestamp in your hdfs file and updateTimestampHDFS(currentTimestamp) in which you will update the timestamp when you received record in your micro batch.您的代码将如下所示,您需要实现getTimeStampFromHDFS() ,它将在您的 hdfs 文件和updateTimestampHDFS(currentTimestamp)中返回时间戳,其中您将在收到微批处理中的记录时更新时间戳。

dstream.foreachRDD{rdd => 
    if(rdd.isEmpty) {
        if((System.currentTimeMillis - getTimeStampFromHDFS()) / (1000 * 60 * 60) >= 1) sendMailAlert()
    }
    else {
        updateTimestampHDFS(System.currentTimeMillis)
    }
}

You have several options to do that:您有多种选择可以做到这一点:

  • Add your implementation of a StreamListener ( API ) to the stream you are subscribed to.将您的StreamListener ( API ) 实现添加到您订阅的流中。 Then, override the onBatchCompleted to access the BatchInfo that will give you the batch time and size.然后,覆盖onBatchCompleted以访问BatchInfo ,它将为您提供批处理时间和大小。 With that info, you can track what happens and raise an alarm if no data has been received during a specific period.有了这些信息,您可以跟踪发生的情况并在特定时间段内未收到任何数据时发出警报。

  • You also can use the REST api provided for monitoring, described here .您还可以使用提供用于监控REST API,描述在这里 You check it from the outside and raise the alarm if needed.你从外面检查它并在需要时发出警报。 For instance, it might be helpful to check /applications/[app-id]/streaming/batches例如,检查/applications/[app-id]/streaming/batches可能会有所帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM