简体   繁体   English

使用 apache flink 进行数据聚合

[英]Data aggregation using apache flink

I am trying to create a real time number counter using flink data streaming api.我正在尝试使用 flink 数据流 api 创建一个实时数字计数器。 But I am facing some problem to achieve the solution.但我面临一些问题来实现解决方案。

Example: Data Payload示例:数据负载

{
    "room": 1,  # Room Number
    "numbers": [101, 111, 201, 211, 13, ....], # Only these numbers in output with count
    "my_number": 401  # My Current Number according to room
}

There are only 4 rooms 1, 2, 3, and 4, and my_number will vary according to room.只有4个房间1、2、3和4,my_number会根据房间而变化。 This is stream data that I am passing to the flink.这是我传递给 flink 的 stream 数据。

Problem Statement: I want to count number according to room and in output want to return only arrays numbers with its count.问题陈述:我想根据房间计数,在 output 中只想返回 arrays 数字及其计数。 This is same for each room.每个房间都是一样的。

output example:
 [
    {
        101: 2,
        111: 5,
        201: 1
        .
        .
        .
    }
 ]

If I understand correctly, you can do this:如果我理解正确,您可以这样做:

dataPayloadSource.keyby("room").process(new CountNumbers()).flatMap(new MapDataPayloadToCorrespondObject())addSink(...);

// ...
public class CountNumbers extends KeyedProcessFunction<..>{
    private MapState<Integer, Integer> numberCountState;
    
    public void open(Configuration config){
        // initialize state in here
    }
  
    public void processElement(DataPayload dp){
        // for each numbers in the dp.counts, get the state value with numberCountState.get(..)
        // check it returns null, if yes, map does not have that key, initialize with 1
        // if not null, then get the current value from the map, increment by 1
        // update the mapstate
    } 
}

// ...
public class MapDataPayloadToCorrespondObject extends RichFlatMapFunction<...>{
    public void flatMap(...){
        // convert DataPayload to OutputObject
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM