[英]Spark 2.0 - Convert DataFrame to DataSet
我想加载我的数据并对其进行一些基本的线性回归。 因此,首先,我需要使用VectorAssembler
生成VectorAssembler
列。 但是,当我使用assembler.transform(df)
, df
是一个DataFrame
,并且它需要一个DataSet
。 我尝试了df.toDS
,但它给value toDS is not a member of org.apache.spark.sql.DataFrame
带来了value toDS is not a member of org.apache.spark.sql.DataFrame
,它value toDS is not a member of org.apache.spark.sql.DataFrame
。 实际上,它是org.apache.spark.sql.DatasetHolder
的成员。
我这是怎么了?
package main.scala
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.DatasetHolder
import org.apache.spark.ml.regression.LinearRegression
import org.apache.spark.ml.feature.RFormula
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.linalg.Vectors
object Analyzer {
def main(args: Array[String]) {
val conf = new SparkConf()
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "false")
.option("delimiter", "\t")
.option("parserLib", "UNIVOCITY")
.option("inferSchema", "true")
.load("data/snap/*")
val assembler = new VectorAssembler()
.setInputCols(Array("own", "want", "wish", "trade", "comment"))
.setOutputCol("features")
val df1 = assembler.transform(df)
val formula = new RFormula().setFormula("rank ~ own + want + wish + trade + comment")
.setFeaturesCol("features")
.setLabelCol("rank")
}
}
显然问题是因为我仍然使用Spark 1.6
样式的Spark
和SQLContext
。 我更改为SparkSession
,并且transform()
能够隐式接受DataFrame
。
package main.scala
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.Dataset
import org.apache.spark.ml.regression.LinearRegression
import org.apache.spark.ml.feature.RFormula
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.linalg.Vectors
object Analyzer {
def main(args: Array[String]) {
val spark = SparkSession.builder().getOrCreate()
import spark.implicits._
val df = spark.read
.format("com.databricks.spark.csv")
.option("header", "false")
.option("delimiter", "\t")
.option("parserLib", "UNIVOCITY")
.option("inferSchema", "true")
.load("data/snap/*")
df.show()
val assembler = new VectorAssembler()
.setInputCols(Array("own", "want", "wish", "trade", "comment"))
.setOutputCol("features")
val df1 = assembler.transform(df)
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.