I have two RDDs: points
and pointsWithinEps
. Their content is in the figures below:
Vector is representing x, y
coordinate. pointsWithinEps
represent two points and distance between them. I want to loop all points
and for every point filter only that elements which are in the pointsWithinEps
as x
(first) coordinate. So for the first point it will give [0]
and [1]
vectors from pointsWithinEps
. I have following code:
for (i <- 0 until points.count.toInt) {
val p = points.take(i + 1).drop(i)
val currentPointNeighbours = pointsWithinEps.filter {
case ((x, y), distance) =>
x == p
}
currentPointNeighbours.foreach(println)
println("----")
}
It does not work correctly. What is wrong with the code?
You can do it efficiently if you transform your RDDs to KV RDDs, and then do join on key. For points key is point itself, for distances key is first point
import org.apache.spark.SparkContext._
type Point = DenseVector
type Distance = ((Point, Point), Double)
val points: RDD[Point] = ???
val pointsWithinEps: RDD[Distance] = ???
// Prepare Tuple2 RDD to enable spark tuple functions
val pointsToKV: RDD[(Point, Unit)] = points.map(p => p -> ())
val distance: RDD[(Point, Distance)] = pointsWithinEps.map(distance => distance._1._1 -> distance)
// Join points with distance
val filtered: RDD[Distance] = pointsToKV.join(distance) map (_._2._2)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.