简体   繁体   English

带有DLQ和ElasticSearch的Kafka Consumer

[英]Kafka Consumer with DLQ and ElasticSearch

I have the following cluster: 我有以下集群:

Kafka -> some log collector -> Elasticsearch Kafka->一些日志收集器-> Elasticsearch

My problem is about choosing the most efficient log collector (or some other software, which allows to manage dataflows between Kafka and ElasticSearch). 我的问题是选择最高效的日志收集器(或某些其他软件,该软件可以管理Kafka和ElasticSearch之间的数据流)。

I'm trying to choose from Logstash , Fluentd and Confluent's Kafka Elasticsearch connector . 我正在尝试从LogstashFluentdConfluent的Kafka Elasticsearch连接器中进行选择 The main problem i'm facing is impossibility to rollback offset in Kafka after having problems writing to the Elasticsearch endpoint. 我面临的主要问题是在写入Elasticsearch端点时遇到问题,无法在Kafka中进行回滚偏移。

For example, logstash doc says that " 400 and 404 errors are sent to the dead letter queue (DLQ), if enabled. If a DLQ is not enabled, a log message will be emitted, and the event will be dropped" ( https://www.elastic.co/guide/en/logstash/6.x/plugins-outputs-elasticsearch.html#_retry_policy ). 例如,logstash文档说:“如果启用,则会将400404错误发送到死信队列(DLQ)。如果未启用DLQ,则会发出日志消息,并且事件将被丢弃”( https: //www.elastic.co/guide/zh-CN/logstash/6.x/plugins-outputs-elasticsearch.html#_retry_policy )。 If i have such an error, logstash would continue to read data from Kafka. 如果我有这样的错误,logstash将继续从Kafka读取数据。 Error would occur again and again. 错误会一次又一次地发生。 Though, all my data will be stored into the DLQ, Kafka's offset will be moved far away from the position, when the first error occured. 虽然,我的所有数据都将存储在DLQ中,但是当发生第一个错误时,Kafka的偏移量将远离该位置。 I would have to define correct offset manually. 我将不得不手动定义正确的偏移量。

So, my question is: Is there any connector for Kafka and ElasticSearch, which allows to stop moving offset after receiving first error from ElasticSearch (400/404)? 所以,我的问题是:是否有用于Kafka和ElasticSearch的连接器,该连接器在从ElasticSearch(400/404)收到第一个错误后就可以停止移动偏移量?

Thanks in advance. 提前致谢。

I don't think the question is about efficiency, rather reliability 我认为问题不是效率,而是可靠性

The main problem i'm facing is impossibility to rollback offset in Kafka after having problems writing to the Elasticsearch endpoint. 我面临的主要问题是在写入Elasticsearch端点时遇到问题,无法在Kafka中进行回滚偏移。

I don't have much experience with the DLQ features of Connect or Logstash, but resetting the consumer group offset is not impossible. 我对Connect或Logstash的DLQ功能没有太多经验,但是重设使用者组偏移量并非不可能。 However, that shouldn't be necessary if the consumer application correctly handles offset commits. 但是,如果使用者应用程序正确处理偏移量提交,则不必这样做。

If Connect throws a connection error to ES, it'll retry, not commit offsets. 如果Connect向ES抛出连接错误,它将重试,而不是提交偏移量。

If the error is unrecoverable, then Connect will stop consuming, and again, not commit offsets. 如果错误无法恢复,则Connect将停止消耗,并且再次停止提交偏移量。

So, the only way you would get missed data from a message batch is if that batch ended up in a DLQ, using whatever framework. 因此,从消息批处理中丢失数据的唯一方法是,无论使用任何框架,该批处理都以DLQ结尾。

If DLQ is disabled, the only way to lose data would be if it expires from Kafka 如果禁用了DLQ,则丢失数据的唯一方法是从卡夫卡过期

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM