简体   繁体   English

无法全局访问Kafka Spark Streaming中的数据

[英]Can't access the data in Kafka Spark Streaming globally

I am trying to Streaming the data from Kafka to Spark 我正在尝试将数据从Kafka流式传输到Spark

JavaPairInputDStream<String, String> directKafkaStream = KafkaUtils.createDirectStream(ssc,
                String.class, 
                String.class, 
                StringDecoder.class, 
                StringDecoder.class, 
                kafkaParams, topics);

Here i am iterating over the JavaPairInputDStream to process the RDD's. 在这里,我正在遍历JavaPairInputDStream以处理RDD。

directKafkaStream.foreachRDD(rdd ->{
            rdd.foreachPartition(items ->{
                while (items.hasNext()) {
                    String[] State = items.next()._2.split("\\,");
                    System.out.println(State[2]+","+State[3]+","+State[4]+"--");
                };
            });        
        }); 

I can able to fetch the data in foreachRDD and my requirement is have to access State Array globally. 我可以在foreachRDD中获取数据,而我的要求是必须全局访问状态数组。 When i am trying to access the State Array globally i am getting Exception 当我尝试全局访问状态数组时,出现异常

java.lang.IndexOutOfBoundsException: Index: 0, Size: 0

Any suggestions ? 有什么建议么 ? Thanks. 谢谢。

This is more of a joining your lookup table with streaming RDD to get all the items that have a matching 'code' and 'violationCode' fields. 这更多的是将您的查询表与流RDD结合在一起,以获取具有匹配的“ code”和“ violationCode”字段的所有项目。

The flow should be like this. 流程应该是这样的。

  1. Create an RDD of Hive lookup table => lookupRdd 创建Hive查找表的RDD => lookupRdd
  2. Create DStream from kafka stream 从kafka流创建DStream
  3. For each RDD in Dstream, join lookupRDD with streamRdd, process the joined items(calculate sum of amount...) and save this processed result. 对于Dstream中的每个RDD,将lookupRDD与streamRdd结合在一起,处理所结合的项(计算金额之和...)并保存此处理后的结果。

Note Below code is incomplete. 注意下面的代码不完整。 Please complete all the TODO comments. 请完成所有待办事项注释。

JavaPairDStream<String, String> streamPair = directKafkaStream.mapToPair(new PairFunction<Tuple2<String, String>, String, String>() {
        @Override
        public Tuple2<String, String> call(Tuple2<String, String> tuple2) throws Exception {
            System.out.println("Tuple2 Message is----------" + tuple2._2());
            String[] state = tuple2._2.split("\\,");
            return new Tuple2<>(state[4], tuple2._2()); //pair <ViolationCode, data>
        }
    });

    streamPair.foreachRDD(new Function<JavaPairRDD<String, String>, Void>() {
        JavaPairRDD<String, String> hivePairRdd = null;
        @Override
        public Void call(JavaPairRDD<String, String> stringStringJavaPairRDD) throws Exception {
            if (hivePairRdd == null) {
                hivePairRdd = initHiveRdd();
            }
            JavaPairRDD<String, Tuple2<String, String>> joinedRdd = stringStringJavaPairRDD.join(hivePairRdd);
            System.out.println(joinedRdd.take(10));
            //todo process joinedRdd here and save the results.
            joinedRdd.count(); //to trigger an action
            return null;
        }
    });
}

public static JavaPairRDD<String, String> initHiveRdd() {
    JavaRDD<String> hiveTableRDD = null; //todo code to create RDD from hive table
    JavaPairRDD<String, String> hivePairRdd = hiveTableRDD.mapToPair(new PairFunction<String, String, String>() {
        @Override
        public Tuple2<String, String> call(String row) throws Exception {
            String code = null; //TODO process 'row' and get 'code' field
            return new Tuple2<>(code, row);
        }
    });
    return hivePairRdd;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM