简体   繁体   English

Flink 中非 keyed 流数据如何使用 Tumbling 窗口功能?

[英]How to use the Tumbling window function for the non keyed streaming data in Flink?

I want to use the tumbling window function for my program (non keyed data) as it is processing streaming data but only 300 messages/sec.我想为我的程序(非键控数据)使用翻转窗口功能,因为它正在处理流数据,但只有 300 条消息/秒。 I want to take it to at least 5K/sec.我想把它提高到至少 5K/秒。 For this purpose, I want to use the tumbling window for 2 sec just to see speed up its performance.为此,我想使用翻滚窗口 2 秒来加快其性能。 But I am not sure how to use this in my case.但我不确定如何在我的情况下使用它。

Note: I am using the Geomesa HBase platform for saving the messages.注意:我使用 Geomesa HBase 平台来保存消息。 Also, I did not paste my whole application code here as I only need the window function for which this code is sufficient here for your understanding另外,我没有在这里粘贴我的整个应用程序代码,因为我只需要此代码就足以让您理解的窗口函数

Here is my flink code这是我的flink代码

public class Tranport {

    public static void main(String[] args) throws Exception {

        // fetch runtime arguments
        String bootstrapServers = "xx.xx.xxx.xxx:xxxx";

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // Set up the Consumer and create a datastream from this source
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers", bootstrapServers);
        properties.setProperty("group.id", "group_id");
        final FlinkKafkaConsumer<String> flinkConsumer = new FlinkKafkaConsumer<>("lc", new SimpleStringSchema(), properties);
        flinkConsumer.setStartFromTimestamp(Long.parseLong("0"));

        DataStream<String> readingStream = env.addSource(flinkConsumer);
        readingStream.rebalance().map(new RichMapFunction<String, String>() {

            private static final long serialVersionUID = -2547861355L; // random number

            DataStore lc_live = null;
            
            SimpleFeatureType sft_live;
            SimpleFeatureBuilder SFbuilderLive; // feature builder for live

            List<SimpleFeature> lc_live_features; // 

            @Override
            public void open(Configuration parameters) throws Exception {
                System.out.println("In open method.");

                // --- GEOMESA, GEOTOOLS APPROACH ---//
                // define connection parameters to xxx GeoMesa-HBase DataStore
                Map<String, Serializable> params_live = new HashMap<>();
                params_live.put("xxxx", "xxx"); // HBase table name
                params_live.put("xxxx","xxxx");

                try {
                    lc_live = DataStoreFinder.getDataStore(params_live);
                    if (lc_live == null) {
                        System.out.println("Could not connect to live");
                    } else {
                        System.out.println("Successfully connected to live");
                    }
                } catch (IOException e) {
                    e.printStackTrace();
                }

                // create simple feature type for x table in HBASE 
                StringBuilder attributes1 = new StringBuilder();
                attributes1.append("xxx:String,");
                attributes1.append("xxx:Long,");
                attributes1.append("source:String,");
                attributes1.append("xxx:String,");
                attributes1.append("xxx:Double,");
                attributes1.append("status:String,");
                attributes1.append("forecast:Double,");
                attributes1.append("carsCount:Integer,");
                attributes1.append("*xxx:Point:srid=4326");
                sft_history = SimpleFeatureTypes.createType("xxxx", attributes1.toString());

                try {
                    lc_history.createSchema(sft_history);                                               

                } catch (IOException e) {
                    e.printStackTrace();
                }

                // Initialize the variables
                numberOfMessagesProcessed = 0;
                numberOfMessagesFailed = 0;
                numberOfMessagesSkipped = 0;

        // for lc_Live
                lc_live_features = new ArrayList<>();
                SFbuilderLive = new SimpleFeatureBuilder(sft_live);

Here I want to create a Tumbling window function (Window All) which can take all the stream messages with in 2 seconds of window and push them into the array list which i have created below在这里,我想创建一个翻转窗口函数(Window All),它可以在 2 秒的窗口中获取所有流消息并将它们推送到我在下面创建的数组列表中

        
                        // live GeoMesa-HBase DataStore
                        // copy the list into a local variable and empty the list for the next iteration
                        List<SimpleFeature> LocalFeatures = live_features;
                        live_features = new ArrayList<>();
                        LocalFeatures = Collections.unmodifiableList(LocalFeatures);
                        try (FeatureWriter<SimpleFeatureType, SimpleFeature> writer = live.getFeatureWriterAppend(sft_live.getTypeName(), Transaction.AUTO_COMMIT)) {
                            System.out.println("Writing " + LocalFeatures.size() + " features to live");
                            for (SimpleFeature feature : LocalFeatures) {
                                SimpleFeature toWrite = writer.next();
                                toWrite.setAttributes(feature.getAttributes());
                                ((FeatureIdImpl) toWrite.getIdentifier()).setID(feature.getID());
                                toWrite.getUserData().put(Hints.USE_PROVIDED_FID, Boolean.TRUE);
                                toWrite.getUserData().putAll(feature.getUserData());
                                writer.write();
                            }
                        } catch (IOException e) {
                            e.printStackTrace();
                        }

It's late but might help someone.已经晚了,但可能会帮助某人。 In Scala , you can do something likeScala中,您可以执行类似的操作

 env.addSource(consumer).
      windowAll(TumblingProcessingTimeWindows.of(Time.seconds(2)))  

But, remember if you are not using KeyBy() , then your data won't be processed in parallel no matter what value you set in env.setParallelism()但是,请记住,如果您没有使用KeyBy() ,那么无论您在env.setParallelism()中设置什么值,您的数据都不会被并行处理

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM