[英]Spark SQL's Scala API - TimestampType - No Encoder found for org.apache.spark.sql.types.TimestampType
I am using Spark 2.1 with Scala 2.11 on a Databricks notebook 我在Databricks笔记本上使用Spark 2.1和Scala 2.11
What is exactly TimestampType ? 什么是TimestampType?
We know from SparkSQL's documentation that's the official timestamp type is TimestampType, which is apparently an alias for java.sql.Timestamp : 我们知道,从SparkSQL的文档那是官方的时间戳类型是TimestampType,这显然是对的java.sql.Timestamp一个别名:
TimestampType can be found here in the SparkSQL's Scala API TimestampType可以在SparkSQL的Scala API中找到
We have a difference when using a schema and the Dataset API 使用模式和数据集API时,我们有所不同
When parsing {"time":1469501297,"action":"Open"}
from the Databricks' Scala Structured Streaming example 解析Databricks的Scala Structured Streaming示例中的
{"time":1469501297,"action":"Open"}
Using a Json schema --> OK (I do prefer using the elegant Dataset API) : 使用Json架构 - > OK (我更喜欢使用优雅的Dataset API):
val jsonSchema = new StructType().add("time", TimestampType).add("action", StringType)
val staticInputDF =
spark
.read
.schema(jsonSchema)
.json(inputPath)
Using the Dataset API --> KO : No Encoder found for TimestampType 使用数据集API - > KO :找不到TimestampType的编码器
Creating the Event class 创建Event类
import org.apache.spark.sql.types._
case class Event(action: String, time: TimestampType)
--> defined class Event
Errors when reading the events from DBFS on databricks. 在databricks上从DBFS读取事件时出错。
Note: we don't get the error when using java.sql.Timestamp
as a type for "time" 注意:使用
java.sql.Timestamp
作为“time”的类型时,我们不会收到错误
val path = "/databricks-datasets/structured-streaming/events/"
val events = spark.read.json(path).as[Event]
Error message 错误信息
java.lang.UnsupportedOperationException: No Encoder found for org.apache.spark.sql.types.TimestampType
- field (class: "org.apache.spark.sql.types.TimestampType", name: "time")
- root class:
Combining the schema read method .schema(jsonSchema)
and the as[Type]
method containing the type java.sql.Timestamp
will solve this issue. 结合模式读取方法
.schema(jsonSchema)
和包含类型java.sql.Timestamp
的as[Type]
方法将解决此问题。 The idea came to be after reading from the Structured Streaming documentation Creating streaming DataFrames and streaming Datasets 这个想法是在阅读Structured Streaming文档创建流式数据框架和流式数据集之后得出的
These examples generate streaming DataFrames that are untyped , meaning that the schema of the DataFrame is not checked at compile time, only checked at runtime when the query is submitted.
这些示例生成无类型的流式DataFrame,这意味着在编译时不检查DataFrame的架构,仅在提交查询时在运行时检查。 Some operations like map, flatMap, etc. need the type to be known at compile time.
map,flatMap等一些操作需要在编译时知道类型。 To do those, you can convert these untyped streaming DataFrames to typed streaming Datasets using the same methods as static DataFrame.
要执行这些操作, 您可以使用与静态DataFrame相同的方法将这些无类型流式DataFrame转换为类型化流式数据集。
val path = "/databricks-datasets/structured-streaming/events/"
val jsonSchema = new StructType().add("time", TimestampType).add("action", StringType)
case class Event(action: String, time: java.sql.Timestamp)
val staticInputDS =
spark
.read
.schema(jsonSchema)
.json(path)
.as[Event]
staticInputDF.printSchema
Will output : 将输出:
root
|-- time: timestamp (nullable = true)
|-- action: string (nullable = true)
TimestampType
is not an alias for java.sql.Timestamp
, but rather a representation of a timestamp type for Spark internal usage. TimestampType
不是java.sql.Timestamp
的别名,而是Spark内部使用的时间戳类型的表示。 In general you don't want to use TimestampType
in your code. 通常,您不希望在代码中使用
TimestampType
。 The idea is that java.sql.Timestamp
is supported by Spark SQL natively, so you can define you event class as follows: 我们的想法是本地支持Spark SQL的
java.sql.Timestamp
,因此您可以按如下方式定义事件类:
case class Event(action: String, time: java.sql.Timestamp)
Internally, Spark will then use TimestampType
to model the type of a value at runtime, when compiling and optimizing your query, but this is not something you're interested in most of the time. 在内部,Spark将在编译和优化查询时使用
TimestampType
在运行时对值的类型进行建模,但这不是您在大多数时间感兴趣的内容。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.