简体   繁体   中英

Process large data in flink broadcast stream

I am using a Flink streaming Java application with input source as Kafka. Totally 4 streams are used in my application. One is the main data stream and another 3 three are used for a broadcast stream.

Stream A is the main stream, it flows continuously from Kafka.

Stream B is a dataset of enrichment data. Stream B is a Combined stream of Stream C, Stream D, Stream E. It's a big one (All the 3 stream size is large).

Stream C, Stream D, Stream E streams Object type is different. (For example, one stream type is Employee, Another one type is AttendanceDetails, another one is SalaryDetails, etc...).

I was joined the three broadcast streams using Either type. I have broadcast as the Stream B and able to receive in Broadcast Process Function context state (ie in processBroadcastElement() ).

My questions are,

  1. Is it possible to store large data in Broadcast state?

  2. Is it possible for Broadcast large data?

If possible for store large data means, how much data(ie data size) can able to store in Broadcast state and can able to apply Fault tolerance and Flink checkpoints ? My Flink system memory and storage size are:

       Memory: 8 GB
       Disk Size: 20-25 GB

How to configure memory size for the Broadcast state in Flink?

Note : As per my understanding, Flink Broadcast State is kept in memory at runtime (it mean broadcast state will not be stored at rocksdb) and the broadcast stream is used as a low-throughput event stream. Since currently, the RocksDB state backend is not available for the operator state.

The working copy of broadcast state is always on the heap; not in RocksDB. So, it has to be small enough to fit in memory. Furthermore, each instance will copy all of the broadcast state into its checkpoints, so all checkpoints and savepoints will have n copies of the broadcast state (where n is the parallelism).

If you are able to key partition this data, then you might not need to broadcast it. It sounds like it might be per-employee data that could be keyed by the employeeId. But if not, then you'll have to keep it small enough to fit into memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM