简体   繁体   English

无法解析重载方法“createDataFrame”

[英]Cannot resolve overloaded method 'createDataFrame'

The following code:以下代码:

val data1 = Seq(("Android", 1, "2021-07-24 12:01:19.000", "play"), ("Android", 1, "2021-07-24 12:02:19.000", "stop"),
  ("Apple", 1, "2021-07-24 12:03:19.000", "play"), ("Apple", 1, "2021-07-24 12:04:19.000", "stop"))

val schema1 = StructType(Array(StructField("device_id", StringType, true),
  StructField("video_id", IntegerType, true),
  StructField("event_timestamp", StringType, true),
  StructField("event_type", StringType, true)
))

val spark = SparkSession.builder()
  .enableHiveSupport()
  .appName("PlayStop")
  .getOrCreate()

var transaction=spark.createDataFrame(data1, schema1)

produces the error:产生错误:

Cannot resolve overloaded method 'createDataFrame'无法解析重载方法“createDataFrame”

Why?为什么?

And how to fix it?以及如何解决?

If your schema consists of default StructField settings, the easiest way to create a DataFrame would be to simply apply toDF() :如果您的架构包含默认的StructField设置,则创建 DataFrame 的最简单方法是简单地应用toDF()

val transaction = data1.toDF("device_id", "video_id", "event_timestamp", "event_type")

To specify custom schema definition, note that createDataFrame() takes a RDD[Row] and schema as its parameters.要指定自定义模式定义,请注意createDataFrame()RDD[Row]和模式作为其参数。 In your case, you could transform data1 into a RDD[Row] like below:在您的情况下,您可以将 data1 转换为RDD[Row] ,如下所示:

val transaction = spark.createDataFrame(sc.parallelize(data1.map(Row(_))), schema1)

An alternative is to use toDF , followed by rdd which represents a DataFrame (ie Dataset[Row] ) as RDD[Row] :另一种方法是使用toDF ,后跟rdd ,它表示一个 DataFrame (即Dataset[Row] )作为RDD[Row]

val transaction = spark.createDataFrame(data1.toDF.rdd, schema1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM