简体   繁体   中英

Spark-Java : Display join RDD

I am trying to join two pairRDDs as show below and whereas

lat1 : K,V -> k-integer , V- Double lat2 : K,V -> k-integer , V- Double

   JavaPairRDD<Integer,Tuple2<Double,Double>> latlong = lat.join(long);

Am assuming the new RDD will be K,[V1,V2] and i want to display the new RDD

And also if i want to do operations based on value, what is the way to perform

Please suggest in Spark-Java Api

Ps: I have seen many answers are in scala but my requirement is to implement in JAVa

From Spark documentation:

When join called on datasets of type (K, V) and (K, W), returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key.

So you are right with this assumption:

JavaPairRDD<Integer,Tuple2<Double,Double>> latlong = lat.join(long);

When you need to work with values in JavaPairRDD , you can use #mapValues() method:

Pass each value in the key-value pair RDD through a map function without changing the keys; this also retains the original RDD's partitioning.

For displaying the JavaPairRDD you can use the same output methods as usual eg #saveAsTextFile()


When you need to map values in (K, (V, W)) to something else like (K,VW) you can use the mentioned mapValues() transformation:

JavaPairRDD<Integer, String> pairs = latlong.mapValues(
        new Function<Tuple2<Double, Double>, String>() {
          @Override
          public String call(Tuple2<Double, Double> value) throws Exception {
            return value._1() + "-" + value._2();
          }
        });

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM