![](/img/trans.png)
[英]scala.collection.immutable.Iterable[org.apache.spark.sql.Row] to DataFrame ? error: overloaded method value createDataFrame with alternatives
[英]Scala Spark - Get Overloaded method when calling createDataFrame
我嘗試從如下的數組Array of Array(Array [Array [Double]])創建一個DataFrame:
val points : ArrayBuffer[Array[Double]] = ArrayBuffer(
Array(0.19238990024216676, 1.0, 0.0, 0.0),
Array(0.2864319929878242, 0.0, 1.0, 0.0),
Array(0.11160349352921925, 0.0, 2.0, 1.0),
Array(0.3659220026496052, 2.0, 2.0, 0.0),
Array(0.31809629470827383, 1.0, 1.0, 1.0))
val x = Array("__1", "__2", "__3", "__4")
val myschema = StructType(x.map(fieldName ⇒ StructField(fieldName, DoubleType, true)))
points.map(e => Row(e(0), e(1), e(2), e(3)))
val newDF = sqlContext.createDataFrame(points, myschema)
但得到這個錯誤:
<console>:113: error: overloaded method value createDataFrame with alternatives:
(data: java.util.List[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
(rdd: org.apache.spark.api.java.JavaRDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
(rdd: org.apache.spark.rdd.RDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
(rows: java.util.List[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
(rowRDD: org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
(rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
cannot be applied to (scala.collection.mutable.ArrayBuffer[Array[Double]], org.apache.spark.sql.types.StructType)
val newDF = sqlContext.createDataFrame(points, myschema)
我在互聯網上搜索但無法找到解決方法! 所以,如果有人對此有任何想法,請幫助我!
方法createDataFrame
沒有重載接受ArrayBuffer[Array[Double]]
的實例。 您對points.map
調用未被分配給任何內容,它返回一個新實例而不是就地操作。 嘗試:
val points : List[Array[Double]] = List(
Seq(0.19238990024216676, 1.0, 0.0, 0.0),
Seq(0.2864319929878242, 0.0, 1.0, 0.0),
Seq(0.11160349352921925, 0.0, 2.0, 1.0),
Seq(0.3659220026496052, 2.0, 2.0, 0.0),
Seq(0.31809629470827383, 1.0, 1.0, 1.0))
val x = Array("__1", "__2", "__3", "__4")
val myschema = StructType(x.map(fieldName ⇒ StructField(fieldName, DoubleType, true)))
val newDF = sqlContext.createDataFrame(
points.map(Row.fromSeq(_), myschema)
這對我有用:
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import scala.collection.mutable.ArrayBuffer
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val points : ArrayBuffer[Array[Double]] = ArrayBuffer(
Array(0.19238990024216676, 1.0, 0.0, 0.0),
Array(0.2864319929878242, 0.0, 1.0, 0.0),
Array(0.11160349352921925, 0.0, 2.0, 1.0),
Array(0.3659220026496052, 2.0, 2.0, 0.0),
Array(0.31809629470827383, 1.0, 1.0, 1.0))
val x = Array("__1", "__2", "__3", "__4")
val myschema = StructType(x.map(fieldName ⇒ StructField(fieldName, DoubleType, true)))
val rdd = sc.parallelize(points.map(e => Row(e(0), e(1), e(2), e(3))))
val newDF = sqlContext.createDataFrame(rdd, myschema)
newDF.show
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.