繁体   English   中英

我不断收到错误消息:value toDF不是org.apache.spark.rdd.RDD的成员

[英]i keep getting error: value toDF is not a member of org.apache.spark.rdd.RDD

我写了“ import sqlContext.implicits._”; 但是,它仍然不起作用。 这是正确的火花壳。 为什么在这种情况下不正确? 我看到了许多其他方法可以将rdd转换为数据帧,但是我的大部分代码已编写为toDF()。 如何使toDF工作? 错误:

import org.apache.spark.ml.evaluation.RegressionEvaluator
import org.apache.spark.ml.recommendation.ALS
import org.apache.spark.ml.tuning.{ParamGridBuilder, CrossValidator}
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.DoubleType
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
import sys.process._

"rm -f ./ml-1m.zip".!
"wget http://files.grouplens.org/datasets/movielens/ml-1m.zip".!

"ls ./ml-1m.zip".!

"rm -r ./ml-1m".!
"unzip ml-1m.zip".!

"ls ./ml-1m".!

val ratings_raw = sc.textFile("./ml-1m/ratings.dat")
ratings_raw.takeSample(false,10, seed=0).foreach(println)

case class Rating(userId: Int, movieId: Int, rating: Float)
val ratings = ratings_raw.map(x => x.split("::")).map(r => Rating(r(0).toInt, r(1).toInt, r(2).toFloat)).toDF().na.drop()

如果您使用的是Spark-shell,则无需创建新的SQLContext

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

您可以直接使用spark


scala> import spark.implicits._

scala> val ratings_raw = sc.textFile("./ml-1m/ratings.dat")
ratings_raw: org.apache.spark.rdd.RDD[String] = ./ml-1m/ratings.dat MapPartitionsRDD[1] at textFile at <console>:38

scala> case class Rating(userId: Int, movieId: Int, rating: Float)
defined class Rating

scala> val ratings = ratings_raw.map(x => x.split("::")).map(r => Rating(r(0).toInt, r(1).toInt, r(2).toFloat)).toDF().na.drop()
ratings: org.apache.spark.sql.DataFrame = [userId: int, movieId: int ... 1 more field]

scala> ratings
res3: org.apache.spark.sql.DataFrame = [userId: int, movieId: int ... 1 more field]

scala> ratings.printSchema
root
 |-- userId: integer (nullable = false)
 |-- movieId: integer (nullable = false)
 |-- rating: float (nullable = false)

我尝试了您的代码,效果很好!

但是,我使用了如下的spark会话

val spark = SparkSession.builder
            .master("local")
            .appName("test1")
            .getOrCreate()

而不是不推荐使用

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM