简体繁体中英

Multiple Flink pipelines for the same Kafka topic

原文 2019-11-14 09:53:00 8 1 apache-kafka/ apache-flink

Background

We have a Kafka topic with a steady stream of data. To process it we have a stateless Flink pipeline that consumes that topic and writes to another topic.

From time to time we have bursts of information that our Flink is not configured to handle. We don't want to configure our Flink pipeline and cluster to always support the maximum load we can have, we want to dynamically scale according to the load. (budget reasons $$$)

Solutions we thought of

One way to do so is to add/remove nodes to the Flink cluster and change the parallelism of the Flink pipeline operators. This will require stopping the Flink job with a snapshot, reconfiguring the parallelism and restarting with new parallelism.

This would be great but we cannot allow ourselves the downtime it produces. We have to scale up/down without downtime.

If we would use regular Kafka consumers it would be as simple as adding a consumer (assuming we have enough Kafka partitions) and Kafka would redistribute the topic partitions between all the consumers.

The Flink Kafka consumer manages the partition assignment and the offset on its own which allows exactly-once semantics (we don't need it). The drawback is that a single Flink job always uses all the topic partitions.

We thought we could create another instance of Flink that would subscribe to the same topic with the same group and let Kafka distribute the partitions between them. But for that we would need the Kafka Flink consumer to let Kafka manage which partitions are assigned to which consumer.

What are we looking for

We couldn't find a library that contains such a consumer or a configuration in the existing consumer. We could write it on our own (not so difficult) but if there is an existing solution we'd rather use it.

Are we missing something? Are we misunderstanding something? Is there a better solution?

Thanks!

1 answers

The most straightforward approach, since you said that at worst you'll need double the capacity, would be to modify your topology to be able to write Kafka messages you can't process quickly enough to a second overflow Kafka topic. Both input and output Kafka topic names would be configurable. Maybe you would have a threshold backlog delay that automatically triggers this writing or maybe you would have a flag in the topology that you can externally set while the topology is running. That's a design detail you can work through that has operational implications.

This gives you a Flink topology that can handle some maximum number of messages in a timely fashion while writing the rest of the messages that can't be handled to a second Kafka topic. You can then run a second instance of the same Flink topology that reads from that secondary topic and writes, if necessary to a third topic. If the writing to the overflow topic happens very early in the topology processing, you could chain several of these instances together via Kafka with minimal latency and without having to reconfigure and restart any topologies.

Send in multiple topic kafka sink with flink

Kafka multiple partitions of the same topic in the same broker

Kafka: Can we have consumers subscribing to same topic but have different pipelines inside the topic?

Kafka multiple producer writing to same topic?

Create multiple consumers for same topic in kafka

Kafka Producer (with multiple instance) writing to same topic

Listen to multiple type of objects in the same Kafka topic

Kafka ordering with multiple producers on same topic and parititon

Kafka topic to multiple kafka topics dispatcher (same cluster)

Consuming from the beginning of a kafka topic with Flink

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Send in multiple topic kafka sink with flink Kafka multiple partitions of the same topic in the same broker Kafka: Can we have consumers subscribing to same topic but have different pipelines inside the topic? Kafka multiple producer writing to same topic? Create multiple consumers for same topic in kafka Kafka Producer (with multiple instance) writing to same topic Listen to multiple type of objects in the same Kafka topic Kafka ordering with multiple producers on same topic and parititon Kafka topic to multiple kafka topics dispatcher (same cluster) Consuming from the beginning of a kafka topic with Flink

Related Tags

Multiple Flink pipelines for the same Kafka topic

Question

Background

Solutions we thought of

What are we looking for

1 answers

solution1 0 2019-11-17 07:20:20

solution1
0 2019-11-17 07:20:20