I am trying to join two pairRDDs as show below and whereas
lat1 : K,V -> k-integer , V- Double lat2 : K,V -> k-integer , V- Double
JavaPairRDD<Integer,Tuple2<Double,Double>> latlong = lat.join(long);
Am assuming the new RDD will be K,[V1,V2] and i want to display the new RDD
And also if i want to do operations based on value, what is the way to perform
Please suggest in Spark-Java Api
Ps: I have seen many answers are in scala but my requirement is to implement in JAVa
From Spark documentation:
When join called on datasets of type (K, V) and (K, W), returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key.
So you are right with this assumption:
JavaPairRDD<Integer,Tuple2<Double,Double>> latlong = lat.join(long);
When you need to work with values in JavaPairRDD
, you can use #mapValues()
method:
Pass each value in the key-value pair RDD through a map function without changing the keys; this also retains the original RDD's partitioning.
For displaying the JavaPairRDD
you can use the same output methods as usual eg #saveAsTextFile()
When you need to map values in (K, (V, W))
to something else like (K,VW)
you can use the mentioned mapValues()
transformation:
JavaPairRDD<Integer, String> pairs = latlong.mapValues(
new Function<Tuple2<Double, Double>, String>() {
@Override
public String call(Tuple2<Double, Double> value) throws Exception {
return value._1() + "-" + value._2();
}
});
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.