简体   繁体   English

Spark:用于打印RDD的功能[A]

[英]Spark: function to print RDD[A]

I am writing a function which receives an RDD and a number integer n , and prints the n elements of the received RDD. 我正在编写一个函数,该函数接收RDD和一个整数n ,并打印接收到的RDD的n元素。

The RDD parameter hasn't a predetermined type and, using pattern matching, I wanted to print in a different way depending on the RDD. RDD参数没有预定的类型,使用模式匹配,我想根据RDD以不同的方式打印。

For instance, if I have this: myRDD:RDD[(String, Array[String])] . 例如,如果我有: myRDD:RDD[(String, Array[String])] When I call printRddContent(myRDD) , I would like to print it in this way (outside a function, this works well): 当我调用printRddContent(myRDD) ,我想以这种方式打印它(在函数之外,这很好用):

anRdd.map { case (a, arr) => (a, arr.toList) }.collect().take(n).foreach(println)

And so on, with different patterns. 等等,具有不同的模式。

So far, this is my code: 到目前为止,这是我的代码:

  def printRddContent[A](anRdd: RDD[A], n: Int) = {  
    anRdd match {
      case r1: RDD[(String, Array[String])] => anRdd.map { case (a, arr) => (a, arr.List) }.take(n).foreach(println)
      case _ => "case clause"
    }
  }

But the .toList shows a message: Cannot resolve symbol toList . 但是.toList显示一条消息: Cannot resolve symbol toList I don't understand why this is not working inside the function. 我不明白为什么这在函数内部不起作用。

Here's a solution based on the code you provided : 这是基于您提供的代码的解决方案:

  def printRddContent[A](anRdd: RDD[A], n: Int) = {
    anRdd match {
      case r1: RDD[(String, Array[String])] => r1.asInstanceOf[RDD[(String, Array[String])]].map { case (a, arr) => (a, arr.toList)}.take(n).foreach(println)
      case _ => "case clause"
   }
 }

In this case, it's safe to use asInstanceOf since we have already checked that the RDD corresponds perfectly to the type (via the pattern matching) 在这种情况下,使用asInstanceOf是安全的,因为我们已经检查了RDD是否与类型完全对应(通过模式匹配)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM