简体   繁体   English

风暴螺栓acking但鲸鱼喷水失败

[英]Storm Bolts acking but spout is failing

I'm having an odd issue with Apache Storm. 我对Apache Storm有一个奇怪的问题。 I have a KafkaSpout hooked up to a Kafka cluster with 10 messages in it. 我有一个KafkaSpout连接到一个Kafka集群,里面有10条消息。

The Bolts receive each message and process them correctly because in the Storm UI they are listed as 'acked'. Bolts接收每条消息并正确处理它们,因为在Storm UI中它们被列为'acked'。 However, the Spout listed under the storm UI says that all of the tuples failed. 但是,风暴用户界面下列出的Spout表示所有元组都失败了。

I believe this causes the spout to re-emit all of the messages again... So I am seeing a Storm Bolt print out messages 1-10 and then print them out in the same order over and over and over. 我相信这会导致喷口再次重新发出所有消息......所以我看到一个Storm Bolt打印出消息1-10然后一遍又一遍地以相同的顺序打印出来。

I am calling .ack() and .fail() methods appropriately, I just don't know why the Spout would be listing them as failed. 我正在适当地调用.ack().fail()方法,我只是不知道为什么Spout会将它们列为失败。

Any thoughts? 有什么想法吗?

It turns out that a couple bolts downstream were not acking when they finished processing a tuple. 事实证明,当他们完成处理元组时,下游的几个螺栓没有执行任务。 This caused the spout tuple to fail and ultimately send the tuple again which resulted in a continuous loop. 这导致spout元组失败并最终再次发送元组,从而导致连续循环。

When the spout reads a message, and passes it to the bolts, the message should complete full processing (all relevant bolts) within TOPOLOGY_MESSAGE_TIMEOUT_SECS / "topology.message.timeout.secs" 当喷口读取消息并将其传递给螺栓时,消息应在TOPOLOGY_MESSAGE_TIMEOUT_SECS / "topology.message.timeout.secs"内完成完整处理(所有相关螺栓)

All relevant bolts must ack, and then the acker indicates to the spout that the message was processed (in case of kafka spout, the spout will then increment the offset). 所有相关的螺栓必须确认,然后acker向喷口指示消息已被处理(在kafka喷口的情况下,喷口将增加偏移量)。

If you see in the logs SPOUT Failing , perhaps: 如果你在日志SPOUT Failing看到,也许:

  1. One of your bolts failed the message 你的一个螺栓失败了
  2. One of your bolts did not ack 你的一个螺栓没有确认
  3. The bolts did not complete handling the message within topology.message.timeout.secs , so an ack was not sent on time. 螺栓未完成处理topology.message.timeout.secs的消息,因此未按时发送确认。

Example of #3: if you have 5 bolts, each takes about 10 seconds due to db connection issues, so after bolt #3 you will pass the default 30sec storm timeout, and fail to process the message. #3的示例:如果您有5个螺栓,由于数据库连接问题,每个螺栓大约需要10秒,因此在#3螺栓之后,您将通过默认的30秒风暴超时,并且无法处理该消息。 The spout will then replay this message again. 然后,喷嘴将再次重播此消息。

So either you raise the timeout configuration, or fail faster (for example: shorter db connection timeout), or sometimes lowering the TOPOLOGY_MAX_SPOUT_PENDING can also help in case lots of messages are waiting to be processed, and earlier messages takes long time. 因此,要么提高超时配置,要么更快失败(例如:更短的数据库连接超时),或者有时降低TOPOLOGY_MAX_SPOUT_PENDING也可以帮助以防大量消息等待处理,而早期消息需要很长时间。

See apache - Guaranteeing Message Processing for more. 请参阅apache - 保证消息处理更多信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM