简体   繁体   中英

Cogroup 5 RDD (get Tuple5 or more)

I would like to know if it is possible in Spark to create a Tuple5 of RDDs. I only manage to get Tuple4 but no more.

    JavaPairRDD<PartitionKey, Tuple4<Iterable<Cat>, Iterable<Dog>, Iterable<Fish>, Iterable<Monkey>>>

    JavaPairRDD<PartitionKey, Cat> RDD1 = getRDD1();
    JavaPairRDD<PartitionKey, Dog> RDD2 = getRDD2();
    JavaPairRDD<PartitionKey, Fish> RDD3 = getRDD3();
    JavaPairRDD<PartitionKey, Monkey> RDD4 = getRDD4();
    JavaPairRDD<PartitionKey, Cow> RDD5 = getRDD5();

    return RDD1.cogroup(RDD2, RDD3, RDD4);

How would you do something like this :

JavaPairRDD<PartitionKey, Tuple5<Iterable<Cat>, Iterable<Dog>, Iterable<Fish>, Iterable<Monkey>, Iterable<Cow>>> = RDD1.cogroup(RDD2, RDD3, RDD4, RDD5);

I really need those cows :)

Thank you

We did not use cogroup. We created a superObject containing every lists.

For each one of the 5 RDDs :

oneOfThe5RDD.join(superRDD).mapToPair(tuple -> {

    SuperObject superObject = tuple._2()._1();
    superObject .setListXXX(IteratorUtils.toList(tuple._2()._2().iterator()));
    return new Tuple2<>(tuple._1(), superObject);
});

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM