I have an RDD
of LabledPoints
. Is it possible to select a subset of it based on a list of indeces?
For example with idx=[0,4,5,6,8]
, I'd like to be able to get a new RDD with elements 0,4,5,6 and 8.
Note that I am not interested about random samples, which is available.
Yes, you can either:
Choose 1 if the list of values is large, else 2.
Edit to show a code sample for case 1.
val filteringValues = //read the list of values, same as you do your points, just easier
.keyBy(_)
val filtered = parsedData
.keyBy(_.something) // Get the number from your inner structure
.rigthOuterJoin(filteringValues) // This select only from your subset
.flatMap(x => x._2._1) // Map it back to the original type.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.