簡體   English   中英

Apache Flink交易匯總

[英]apache flink aggregation of transaction

我一直試圖找出如何編寫一個flink程序來接收來自3 kafka主題的事件,並對今天,昨天和前天進行總結。

所以第一個問題是,我如何在3個不同的天對交易進行匯總並將其提取為json文件

如果您想閱讀3個不同的kafka主題或分區,則必須創建3個kafka來源

Flink關於Kafka消費者的文檔

val env = StreamExecutionEnvironment.getExecutionEnvironment()
val consumer0 = new FlinkKafkaConsumer08[String](...)
val consumer1 = new FlinkKafkaConsumer08[String](...)
val consumer2 = new FlinkKafkaConsumer08[String](...)
consumer0.setStartFromGroupOffsets()
consumer1.setStartFromGroupOffsets()
consumer2.setStartFromGroupOffsets()

val stream0 = env.addSource(consumer0)
val stream1 = env.addSource(consumer1)
val stream2 = env.addSource(consumer2)

val unitedStream = stream0.union(stream1,stream2)

/* Logic to group transactions from 3 days */
/* I need more info, but it should be a Sliding or Fixed windows Keyed by the id of the transactions*/

val windowSize = 1 // number of days that the window use to group events
val windowStep = 1 // window slides 1 day

val reducedStream = unitedStream
    .keyBy("transactionId") // or any field that groups transactions in the same group
    .timeWindow(Time.days(windowSize),Time.days(windowStep))
    .map(transaction => {
        transaction.numberOfTransactions = 1
        transaction
    }).sum("numberOfTransactions");

val streamFormatedAsJson = reducedStream.map(functionToParseDataAsJson) 
// you can use a library like GSON for this
// or a scala string template

streamFormatedAsJson.sink(yourFavoriteSinkToWriteYourData)

如果您的主題名稱可以與“常規”表達式匹配,則只能創建一個kafka使用者,如下所示:

val env = StreamExecutionEnvironment.getExecutionEnvironment()

val consumer = new FlinkKafkaConsumer08[String](
  java.util.regex.Pattern.compile("day-[1-3]"),
  ..., //check documentation to know how to fill this field
  ...) //check documentation to know how to fill this field

val stream = env.addSource(consumer)

最常見的方法是將所有事務都放在同一個kafka主題中,而不是在differents主題中,在這種情況下,代碼將更加簡單,因為您只需要使用窗口來處理數據即可

Day 1 -> 11111 -\
Day 2 -> 22222 --> 1111122222333 -> Window -> 11111 22222 333 -> reduce operation per window partition
Day 3 -> 3333 --/                            |-----|-----|---|

范例程式碼

val env = StreamExecutionEnvironment.getExecutionEnvironment()
val consumer = new FlinkKafkaConsumer08[String](...)
consumer.setStartFromGroupOffsets()

val stream = env.addSource(consumer)

/* Logic to group transactions from 3 days */
/* I need more info, but it should be a Sliding or Fixed windows Keyed by the id of the transactions*/

val windowSize = 1 // number of days that the window use to group events
val windowStep = 1 // window slides 1 day

val reducedStream = stream
    .keyBy("transactionId") // or any field that groups transactions in the same group
    .timeWindow(Time.days(windowSize),Time.days(windowStep))
    .map(transaction => {
        transaction.numberOfTransactions = 1
        transaction
    }).sum("numberOfTransactions");

val streamFormatedAsJson = reducedStream.map(functionToParseDataAsJson) 
// you can use a library like GSON for this
// or a scala string template

streamFormatedAsJson.sink(yourFavoriteSinkToWriteYourData)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM