简体繁体中英

Storm KafkaSpout failed tuples duplicated

原文 2017-10-26 06:04:26 6 1 java/ apache-kafka/ apache-zookeeper/ apache-storm

I am using storm-kafka-1.1.1-plus and storm 1.1.1. And configured using BaseRichBolt, one KafkaSpout and two bolts bolt-A, Bolt-B the tuples are anchored in bolt-A once the bolt-B acknowledged it will be considered as a successfully processed tuple and it will be committed. But, the problem is for some reason some failed message got duplicated in KafkaSpout .

For Example

KafkaSpout emitted 1000 tuples while processing it for some reason nearly 20 tuples were got failed (at bolt-B ). those 20 tuples were replays continuously, at some point worker got killed and supervisor restarts the worker and again those 20 tuples were replays and this time it successfully processed but it processed multiple times( duplicated ).

But, I want those tuples must be processed only once (successfully). I have set the topology.enable.message.timeouts as false . And my another question is where does the Storm stores those failed Kafka offset details . I didn't find it on zookeeper it only has below detail.

{"topology":{"id":"test_Topology-12-1508938595","name":"test_Topology"},"offset":505,"partition":2,"broker":{"host":"127.0.0.1","port":9092},"topic":"test_topic_1"}

1 answers

Disabling message timeouts can cause message loss, you may want to reconsider disabling it if you need all messages to be processed.

Storm provides an at-least-once processing guarantee when acking is enabled. You might want to look at whether you can make your bolts idempotent so replays don't cause you issues. Alternatively you can look at https://storm.apache.org/releases/1.1.1/Trident-tutorial.html , which offers exactly-once state updates.

Edit: You might need to rethink your problem. As far as I'm aware no stream processing system offers exactly-once processing in the sense it sounds like you want.

The exactly-once semantics offered by Trident are that Trident will help you make state updates idempotent, so it will "look like" the messages are only processed once, from the point of view of your data store. Processing is still at-least-once. See the section "Transactional spouts" (and probably the rest of the page) at https://storm.apache.org/releases/2.0.0-SNAPSHOT/Trident-state.html for the intuition about how this would work. The basic idea is to store information in the data store about which messages have already been written, that way if they are repeated the state update code can ignore them.

You might also want to read https://streaml.io/blog/exactly-once . I want to say that Flink implements something like the distributed snapshot algorithm described there, which is a different way to simulate exactly-once in an at-least-once system.

Storm Supervisor can't find KafkaSpout class

Storm KafkaSpout fails when bolt is slow

In Storm is there a way to count the number of tuples that failed due to timeout?

Storm KafkaSpout stopped to consume messages from Kafka Topic

Apache Storm - KafkaSpout not consuming messaes from Kafka Topic

Un-anchoring tuples in Storm

NullPointerException in storm when deserialize tuples

Duplicated result on bolt reception of storm topology

storm redis spout tuples lost with no exception

How to process different tuples in the same bolt (Storm)

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Storm Supervisor can't find KafkaSpout class Storm KafkaSpout fails when bolt is slow In Storm is there a way to count the number of tuples that failed due to timeout? Storm KafkaSpout stopped to consume messages from Kafka Topic Apache Storm - KafkaSpout not consuming messaes from Kafka Topic Un-anchoring tuples in Storm NullPointerException in storm when deserialize tuples Duplicated result on bolt reception of storm topology storm redis spout tuples lost with no exception How to process different tuples in the same bolt (Storm)

Related Tags

Storm KafkaSpout failed tuples duplicated

Question

1 answers

solution1 2 2017-10-27 18:13:06

solution1
2 2017-10-27 18:13:06