[英]Scala Spark - Get Overloaded method when calling createDataFrame
I try to create a DataFrame from an Array of Array of Double (Array[Array[Double]]) like below: 我尝试从如下的数组Array of Array(Array [Array [Double]])创建一个DataFrame:
val points : ArrayBuffer[Array[Double]] = ArrayBuffer(
Array(0.19238990024216676, 1.0, 0.0, 0.0),
Array(0.2864319929878242, 0.0, 1.0, 0.0),
Array(0.11160349352921925, 0.0, 2.0, 1.0),
Array(0.3659220026496052, 2.0, 2.0, 0.0),
Array(0.31809629470827383, 1.0, 1.0, 1.0))
val x = Array("__1", "__2", "__3", "__4")
val myschema = StructType(x.map(fieldName ⇒ StructField(fieldName, DoubleType, true)))
points.map(e => Row(e(0), e(1), e(2), e(3)))
val newDF = sqlContext.createDataFrame(points, myschema)
But get this error: 但得到这个错误:
<console>:113: error: overloaded method value createDataFrame with alternatives:
(data: java.util.List[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
(rdd: org.apache.spark.api.java.JavaRDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
(rdd: org.apache.spark.rdd.RDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
(rows: java.util.List[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
(rowRDD: org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
(rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
cannot be applied to (scala.collection.mutable.ArrayBuffer[Array[Double]], org.apache.spark.sql.types.StructType)
val newDF = sqlContext.createDataFrame(points, myschema)
I searched over the internet but can't find out how to fix it! 我在互联网上搜索但无法找到解决方法! So if anyone has any idea about this, please help me!
所以,如果有人对此有任何想法,请帮助我!
There is no overload of method createDataFrame
that accepts an instance of ArrayBuffer[Array[Double]]
. 方法
createDataFrame
没有重载接受ArrayBuffer[Array[Double]]
的实例。 Your call to points.map
wasn't being assigned to anything, it returns a new instance rather than operating in-place. 您对
points.map
调用未被分配给任何内容,它返回一个新实例而不是就地操作。 Try: 尝试:
val points : List[Array[Double]] = List(
Seq(0.19238990024216676, 1.0, 0.0, 0.0),
Seq(0.2864319929878242, 0.0, 1.0, 0.0),
Seq(0.11160349352921925, 0.0, 2.0, 1.0),
Seq(0.3659220026496052, 2.0, 2.0, 0.0),
Seq(0.31809629470827383, 1.0, 1.0, 1.0))
val x = Array("__1", "__2", "__3", "__4")
val myschema = StructType(x.map(fieldName ⇒ StructField(fieldName, DoubleType, true)))
val newDF = sqlContext.createDataFrame(
points.map(Row.fromSeq(_), myschema)
This works for me: 这对我有用:
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import scala.collection.mutable.ArrayBuffer
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val points : ArrayBuffer[Array[Double]] = ArrayBuffer(
Array(0.19238990024216676, 1.0, 0.0, 0.0),
Array(0.2864319929878242, 0.0, 1.0, 0.0),
Array(0.11160349352921925, 0.0, 2.0, 1.0),
Array(0.3659220026496052, 2.0, 2.0, 0.0),
Array(0.31809629470827383, 1.0, 1.0, 1.0))
val x = Array("__1", "__2", "__3", "__4")
val myschema = StructType(x.map(fieldName ⇒ StructField(fieldName, DoubleType, true)))
val rdd = sc.parallelize(points.map(e => Row(e(0), e(1), e(2), e(3))))
val newDF = sqlContext.createDataFrame(rdd, myschema)
newDF.show
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.