简体   繁体   中英

Spark: function to print RDD[A]

I am writing a function which receives an RDD and a number integer n , and prints the n elements of the received RDD.

The RDD parameter hasn't a predetermined type and, using pattern matching, I wanted to print in a different way depending on the RDD.

For instance, if I have this: myRDD:RDD[(String, Array[String])] . When I call printRddContent(myRDD) , I would like to print it in this way (outside a function, this works well):

anRdd.map { case (a, arr) => (a, arr.toList) }.collect().take(n).foreach(println)

And so on, with different patterns.

So far, this is my code:

  def printRddContent[A](anRdd: RDD[A], n: Int) = {  
    anRdd match {
      case r1: RDD[(String, Array[String])] => anRdd.map { case (a, arr) => (a, arr.List) }.take(n).foreach(println)
      case _ => "case clause"
    }
  }

But the .toList shows a message: Cannot resolve symbol toList . I don't understand why this is not working inside the function.

Here's a solution based on the code you provided :

  def printRddContent[A](anRdd: RDD[A], n: Int) = {
    anRdd match {
      case r1: RDD[(String, Array[String])] => r1.asInstanceOf[RDD[(String, Array[String])]].map { case (a, arr) => (a, arr.toList)}.take(n).foreach(println)
      case _ => "case clause"
   }
 }

In this case, it's safe to use asInstanceOf since we have already checked that the RDD corresponds perfectly to the type (via the pattern matching)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM