scala.collection.immutable.Iterable [org.apache.spark.sql.Row]到DataFrame吗？错误：方法值重载createDataFrame及其它替代方法

Question

I have some sql.Row objects that I wish to convert to a DataFrame in Spark 1.6.x 我有一些sql.Row对象，希望将其转换为Spark 1.6.x中的DataFrame

My Rows look like: 我的行看起来像：

events: scala.collection.immutable.Iterable[org.apache.spark.sql.Row] = List([14183197,Browse,80161702,8702170626376335,59,527780275219,List(NavigationLevel, Session)], [14183197,Browse,80161356,8702171157207449,72,527780278061,List(StartPlay, Action, Session)])

Printed Out: 打印出来：

events.foreach(println)
[14183197,Browse,80161702,8702170626376335,59,527780275219,List(NavigationLevel, Session)]
[14183197,Browse,80161356,8702171157207449,72,527780278061,List(StartPlay, Action, Session)]

So I created a schema for the data; 因此，我为数据创建了一个架构；

 val schema = StructType(Array(
    StructField("trackId", IntegerType, true),
    StructField("location", StringType, true),
    StructField("videoId", IntegerType, true),
    StructField("id", StringType, true),
    StructField("sequence", IntegerType, true),
    StructField("time", StringType, true),
    StructField("type", ArrayType(StringType), true)
  ))

And then I attempt to the create the DataFrame by : 然后我尝试通过以下方式创建DataFrame ：

val df = sqlContext.createDataFrame(events, schema)

But I get the following error; 但是我收到以下错误；

   error: overloaded method value createDataFrame with alternatives:
  (data: java.util.List[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
  (rdd: org.apache.spark.api.java.JavaRDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
  (rdd: org.apache.spark.rdd.RDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
  (rows: java.util.List[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
  (rowRDD: org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
  (rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
 cannot be applied to (scala.collection.immutable.Iterable[org.apache.spark.sql.Row], org.apache.spark.sql.types.StructType)

I not sure why I get this, is it because the underlying data in the Row has no type information ? 我不确定为什么会这样，是因为Row的基础数据没有类型信息吗？

Any help is greatly appreciated 任何帮助是极大的赞赏

Answer 1

You have to parallelize : 您必须parallelize ：

val sc: SparkContext = ???
val df = sqlContext.createDataFrame(sc.parallelize(events), schema)

scala.collection.immutable.Iterable [org.apache.spark.sql.Row]到DataFrame吗？错误：方法值重载createDataFrame及其它替代方法

问题描述

1 个解决方案

解决方案1
0 2017-10-12 11:04:49

scala.collection.immutable.Iterable [org.apache.spark.sql.Row]到DataFrame吗？ 错误：方法值重载createDataFrame及其它替代方法

问题描述

1 个解决方案

解决方案1 0 2017-10-12 11:04:49

scala.collection.immutable.Iterable [org.apache.spark.sql.Row]到DataFrame吗？错误：方法值重载createDataFrame及其它替代方法

解决方案1
0 2017-10-12 11:04:49