简体   繁体   中英

Apache Flink State Store vs Kafka Streams

As far as I know handles Kafka Streams its States localy in memory or on disc or in a Kafka topic because all the input date is from a partition, where all the messages are keyed by a defined value. Most of the time the computations can be done without knowing the state of other Processors. If so, you have another Streams instance whichs calculsates the result. Like in this picture:

在此处输入图片说明

Where exactly does Flink store its States? Can Flink also store the states locally or does it always publish them always to all instances (tasks)? Is it possible to configure Flink so that it stores the States in a Kafka Broker?

Flink also uses local stores (that can be keyed), similar to Kafka Streams. However, it does not write state into Kafka topics.

For fault-tolerance, it takes so-called "distributed snapshots", that are stored in a configurable state backend (eg, HDFS).

Check out the docs for more details:

There is a distinction between Flink and Kafka Streams. Flink is cluster framework, your code is deployed and run as job in Flink Cluster. Kafka streams is API that you embed in your standard java application. Stream processing logic runs inside the your application java process. They both can sink results to Kafka, key value store, database or external systems. Flink's master node implements its own high availability mechanism based on ZooKeeper and ensures the availability interim states after the disaster. If you are using Kafka Streams once you managed to save your interim states to Kafka Cluster you will have the same HA features provided by Kafka Cluster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM