I have some sql.Row
objects that I wish to convert to a DataFrame
in Spark 1.6.x
My Rows look like:
events: scala.collection.immutable.Iterable[org.apache.spark.sql.Row] = List([14183197,Browse,80161702,8702170626376335,59,527780275219,List(NavigationLevel, Session)], [14183197,Browse,80161356,8702171157207449,72,527780278061,List(StartPlay, Action, Session)])
Printed Out:
events.foreach(println)
[14183197,Browse,80161702,8702170626376335,59,527780275219,List(NavigationLevel, Session)]
[14183197,Browse,80161356,8702171157207449,72,527780278061,List(StartPlay, Action, Session)]
So I created a schema for the data;
val schema = StructType(Array(
StructField("trackId", IntegerType, true),
StructField("location", StringType, true),
StructField("videoId", IntegerType, true),
StructField("id", StringType, true),
StructField("sequence", IntegerType, true),
StructField("time", StringType, true),
StructField("type", ArrayType(StringType), true)
))
And then I attempt to the create the DataFrame
by :
val df = sqlContext.createDataFrame(events, schema)
But I get the following error;
error: overloaded method value createDataFrame with alternatives:
(data: java.util.List[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
(rdd: org.apache.spark.api.java.JavaRDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
(rdd: org.apache.spark.rdd.RDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
(rows: java.util.List[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
(rowRDD: org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
(rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
cannot be applied to (scala.collection.immutable.Iterable[org.apache.spark.sql.Row], org.apache.spark.sql.types.StructType)
I not sure why I get this, is it because the underlying data in the Row
has no type information ?
Any help is greatly appreciated
You have to parallelize
:
val sc: SparkContext = ???
val df = sqlContext.createDataFrame(sc.parallelize(events), schema)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.