简体   繁体   中英

Apache Spark (Scala) - print 1 entry of an RDD / pairRDD

When using an RDD I have grouped the items within the RDD by Key.

    val pairRDD = oldRDD.map(x => (x.user, x.product)).groupByKey

pairRDD is of type: RDD(Int, Iterable[Int]))

What I am having trouble with is simply accessing a particular element. What is the point of having a key when I can't seemingly access the item in the RDD by key?

At the minute I filter the RDD down to a single item, however I still have an RDD, and as such I have to do a foreach on the RDD to print it out:

    val userNumber10 = pairRDD.filter(_._1 == 10)
    userNumber10.foreach(x => println("user number = " + x._1))

Alternatively, I can filter the RDD and then take(1) which returns an Array of size 1:

    val userNumber10Array = pairRDD.filter(_._1 == 10).take(1)

Alternatively to that I can select the first element of that returned array:

    val userNumber10Array = pairRDD.filter(_._1 == 10).take(1)(0)

Which returns me the pair as required. But... clearly, this is inconvenient and I would hazard a guess at saying that this is not how an RDD is meant to be used!

Why am I doing this you may ask! Well, the reason it's come about is because I simply wanted to "see" what was in my RDD for my own testing purposes. So, is there a way to access individual items in an RDD (more strictly a pairRDD) and if so, how? If not, what is the purpose of a pairRDD?

Use the lookup function, which belongs to PairRDDFunctions . From the official documentation:

Return the list of values in the RDD for key key. This operation is done efficiently if the RDD has a known partitioner by only searching the partition that the key maps to.

https://spark.apache.org/docs/0.8.1/api/core/org/apache/spark/rdd/PairRDDFunctions.html

And if you just want to see the contents of your RDD, you simply call collect .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM