简体   繁体   English

值toDF不是org.apache.spark.rdd.RDD的成员

[英]value toDF is not a member of org.apache.spark.rdd.RDD

I've read about this issue in other SO posts and I still don't know what I'm doing wrong. 我在其他SO帖子中读过这个问题,但我仍然不知道我做错了什么。 In principle, adding these two lines: 原则上,添加以下两行:

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

should have done the trick but the error persists 应该做的伎俩,但错误仍然存​​在

This my build.sbt: 这是我的build.sbt:

name := "PickACustomer"

version := "1.0"

scalaVersion := "2.11.7"


libraryDependencies ++= Seq("com.databricks" %% "spark-avro" % "2.0.1",
"org.apache.spark" %% "spark-sql" % "1.6.0",
"org.apache.spark" %% "spark-core" % "1.6.0")

and my scala code is: 我的scala代码是:

import scala.collection.mutable.Map
import scala.collection.immutable.Vector

import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql._


    object Foo{

    def reshuffle_rdd(rawText: RDD[String]): RDD[Map[String, (Vector[(Double, Double, String)], Map[String, Double])]]  = {...}

    def do_prediction(shuffled:RDD[Map[String, (Vector[(Double, Double, String)], Map[String, Double])]], prediction:(Vector[(Double, Double, String)] => Map[String, Double]) ) : RDD[Map[String, Double]] = {...}

    def get_match_rate_from_results(results : RDD[Map[String, Double]]) : Map[String, Double]  = {...}


    def retrieve_duid(element: Map[String,(Vector[(Double, Double, String)], Map[String,Double])]): Double = {...}




    def main(args: Array[String]){
        val conf = new SparkConf().setAppName(this.getClass.getSimpleName)
        if (!conf.getOption("spark.master").isDefined) conf.setMaster("local")

        val sc = new SparkContext(conf)

        //This should do the trick
        val sqlContext = new org.apache.spark.sql.SQLContext(sc)
        import sqlContext.implicits._

        val PATH_FILE = "/mnt/fast_export_file_clean.csv"
        val rawText = sc.textFile(PATH_FILE)
        val shuffled = reshuffle_rdd(rawText)

        // PREDICT AS A FUNCTION OF THE LAST SEEN UID
        val results = do_prediction(shuffled.filter(x => retrieve_duid(x) > 1) , predict_as_last_uid)
        results.cache()

        case class Summary(ismatch: Double, t_to_last:Double, nflips:Double,d_uid: Double, truth:Double, guess:Double)

        val summary = results.map(x => Summary(x("match"), x("t_to_last"), x("nflips"), x("d_uid"), x("truth"), x("guess")))


        //PROBLEMATIC LINE
        val sum_df = summary.toDF()

    }
    }

I always get: 我总是得到:

value toDF is not a member of org.apache.spark.rdd.RDD[Summary] 值toDF不是org.apache.spark.rdd.RDD的成员[摘要]

Bit lost now. 有点丢失了。 Any ideas? 有任何想法吗?

Move your case class outside of main : 将您的案例类移到main之外:

object Foo {

  case class Summary(ismatch: Double, t_to_last:Double, nflips:Double,d_uid: Double, truth:Double, guess:Double)

  def main(args: Array[String]){
    ...
  }

}

Something about the scoping of it is preventing Spark from being able to handle the automatic derivation of the schema for Summary . 关于它的范围的一些事情是阻止Spark能够处理Summary的模式的自动派生。 FYI I actually got a different error from sbt : 仅供参考我实际上与sbt有不同的错误:

No TypeTag available for Summary 没有TypeTag可用于摘要

great. 大。 save my life 拯救我的生命

Move your case class outside of main: 将您的案例类移到main之外:

object Foo {

    case class Summary(ismatch: Double, t_to_last:Double, nflips:Double,d_uid: Double, truth:Double, guess:Double)

    def main(args: Array[String]){
...
    }
}

Move your case class outside of the function body. 将您的案例类移到函数体之外。 Then use import spark.implicits._ . 然后使用import spark.implicits._

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 值toDF不是成员org.apache.spark.rdd.RDD - value toDF is not a member org.apache.spark.rdd.RDD 我不断收到错误消息:value toDF不是org.apache.spark.rdd.RDD的成员 - i keep getting error: value toDF is not a member of org.apache.spark.rdd.RDD 值 toDF 不是 org.apache.spark.rdd.RDD[(K, V)] 的成员 - value toDF is not a member of org.apache.spark.rdd.RDD[(K, V)] ubuntu scala ide - spark - toDF 方法错误值 toDF is not member of org.apache.spark.rdd.RDD[String] - ubuntu scala ide - spark - toDF Method error value toDF is not a member of org.apache.spark.rdd.RDD[String] 错误:值toDF不是org.apache.spark.rdd.RDD [org.apache.kafka.clients.consumer.ConsumerRecord [String,String]]的成员 - error: value toDF is not a member of org.apache.spark.rdd.RDD[org.apache.kafka.clients.consumer.ConsumerRecord[String,String]] value reduceByKey不是org.apache.spark.rdd.RDD的成员 - value reduceByKey is not a member of org.apache.spark.rdd.RDD 值collectAsMap不是org.apache.spark.rdd.RDD的成员 - value collectAsMap is not a member of org.apache.spark.rdd.RDD toDS 的值不是 org.apache.spark.rdd.RDD 的成员 - value toDS is not a member of org.apache.spark.rdd.RDD 值查找不是org.apache.spark.rdd.RDD的成员 - Value lookup is not a member of org.apache.spark.rdd.RDD value join不是org.apache.spark.rdd.RDD的成员 - value join is not a member of org.apache.spark.rdd.RDD
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM