[英]Cannot resolve overloaded method 'createDataFrame'
The following code:以下代码:
val data1 = Seq(("Android", 1, "2021-07-24 12:01:19.000", "play"), ("Android", 1, "2021-07-24 12:02:19.000", "stop"),
("Apple", 1, "2021-07-24 12:03:19.000", "play"), ("Apple", 1, "2021-07-24 12:04:19.000", "stop"))
val schema1 = StructType(Array(StructField("device_id", StringType, true),
StructField("video_id", IntegerType, true),
StructField("event_timestamp", StringType, true),
StructField("event_type", StringType, true)
))
val spark = SparkSession.builder()
.enableHiveSupport()
.appName("PlayStop")
.getOrCreate()
var transaction=spark.createDataFrame(data1, schema1)
produces the error:产生错误:
Cannot resolve overloaded method 'createDataFrame'
无法解析重载方法“createDataFrame”
Why?为什么?
And how to fix it?以及如何解决?
If your schema consists of default StructField
settings, the easiest way to create a DataFrame would be to simply apply toDF()
:如果您的架构包含默认的
StructField
设置,则创建 DataFrame 的最简单方法是简单地应用toDF()
:
val transaction = data1.toDF("device_id", "video_id", "event_timestamp", "event_type")
To specify custom schema definition, note that createDataFrame()
takes a RDD[Row]
and schema as its parameters.要指定自定义模式定义,请注意
createDataFrame()
将RDD[Row]
和模式作为其参数。 In your case, you could transform data1 into a RDD[Row]
like below:在您的情况下,您可以将 data1 转换为
RDD[Row]
,如下所示:
val transaction = spark.createDataFrame(sc.parallelize(data1.map(Row(_))), schema1)
An alternative is to use toDF
, followed by rdd
which represents a DataFrame (ie Dataset[Row]
) as RDD[Row]
:另一种方法是使用
toDF
,后跟rdd
,它表示一个 DataFrame (即Dataset[Row]
)作为RDD[Row]
:
val transaction = spark.createDataFrame(data1.toDF.rdd, schema1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.