简体   繁体   English

toDS 的值不是 org.apache.spark.rdd.RDD 的成员

[英]value toDS is not a member of org.apache.spark.rdd.RDD

I am trying to write sample Apache Spark program that converts RDD to Dataset.我正在尝试编写将 RDD 转换为数据集的示例 Apache Spark 程序。 But in that process, I am getting compile time error.但是在那个过程中,我遇到了编译时错误。

Here is my sample code and error:这是我的示例代码和错误:

code:代码:

import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import org.apache.spark.sql.Dataset

object Hello {

  case class Person(name: String, age: Int)

  def main(args: Array[String]){
    val conf = new SparkConf()
      .setAppName("first example")
      .setMaster("local")
    val sc = new SparkContext(conf)
    val peopleRDD: RDD[Person] = sc.parallelize(Seq(Person("John", 27)))
    val people = peopleRDD.toDS
  }
}

and my error is:我的错误是:

value toDS is not a member of org.apache.spark.rdd.RDD[Person]

I have added Spark core and spark SQL jars.我添加了Spark core和spark SQL jars。

and my versions are:我的版本是:

Spark 1.6.2火花 1.6.2

scala 2.10 scala 2.10

Spark version < 2.x Spark版本<2.x

toDS is available with sqlContext.implicits._ toDS提供sqlContext.implicits._

val sqlContext = new SQLContext(sc);
import sqlContext.implicits._
val people = peopleRDD.toDS()

Spark version >= 2.x Spark版本> = 2.x.

val spark: SparkSession = SparkSession.builder
  .config(conf)
  .getOrCreate;

import spark.implicits._
val people = peopleRDD.toDS()

HIH HIH

There are two mistakes I can see in your code. 我可以在你的代码中看到两个错误。

First you have to import sqlContext.implicits._ as toDS and toDF are defined in implicits of sqlContext. 首先,您必须import sqlContext.implicits._因为toDStoDF是在toDS toDF中定义的。

Second is that case class should be defined outside class scope where the case class is being used otherwise task not serializable exception will occur 其次是case class应该在使用case类的类范围之外定义,否则将发生task not serializable exception

Complete solution is as following 完整的解决方案如下

    import org.apache.spark.SparkConf
    import org.apache.spark.rdd.RDD
    import org.apache.spark.SparkContext
    import org.apache.spark.sql.Dataset

    object Hello {
      def main(args: Array[String]){
      val conf = new SparkConf()
      .setAppName("first example")
      .setMaster("local")
      val sc = new SparkContext(conf)
      val sqlContext = new SQLContext(sc)

      import sqlContext.implicits._
      val peopleRDD: RDD[Person] = sc.parallelize(Seq(Person("John", 27)))
      val people = peopleRDD.toDS
      people.show(false)
      }
    }
    case class Person(name: String, age: Int)

The exact answer is you importing both,确切的答案是你同时导入,

import spark.implicits._ 

import sqlContext.implicits._ 

this is causing the issue, remove any 1 of those, you wont face issue like this这是导致问题的原因,删除其中任何一个,您将不会遇到这样的问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM