I would like to implement an apache flink trigger that will fire when the state accumulates 256MB. I would like to do this because my sink is writing parquet files to hdfs and i would like to run ETL on them later, which means I don't want too small or too large files, and my source(apache kafka topic) is changing in volume constantly.
I didn't find a way to do it. I found some the StateObject interface that have the size() function. Didn't find a way to use it.
I would use a Flink FileSink with the Parquet bulk format , and have a rolling policy that constrains the file size, but rolls based on your maximum allowable latency.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.