简体   繁体   English

相当于 TimestampType/java.sql.Timestamp 的 getLong?

[英]Equivalent of getLong for a TimestampType/java.sql.Timestamp?

I am trying to extract the values from a spark streaming dataframe using scala with some code like this:我正在尝试使用带有如下代码的 scala 从火花流数据帧中提取值:

var txs = spark.readStream
  .format("kafka") .option("kafka.bootstrap.servers",KAFKABS)
  .option("subscribe", "txs")
  .load()
txs = txs.selectExpr("CAST(value AS STRING)")

val schema = StructType(Seq(
      StructField("from",StringType,true),
      StructField("to", StringType, true),  
      StructField("timestamp", TimestampType, true),
        StructField("hash", StringType, true),
      StructField("value", StringType, true)
))

txs = txs.selectExpr("cast (value as string) as json")
            .select(from_json($"json", schema).as("data"))
            .select("data.*")
            .selectExpr("from","to","cast(timestamp as timestamp) as timestamp","hash","value") 
val newDataFrame = txs
  .flatMap(row => {
    val to = row.getString(0)
    val from = row.getString(1)
   // val timestamp = row.getTimestamp??

   //do stuff
  })

I am wondering if there is an equivalent typed get method for Timestamps?我想知道时间戳是否有等效的类型化 get 方法? To add to my confusion, it seemed there was some sort of hidden mapping (hidden to me at least) between the SQL types I am defining for my structured stream, and the actual types of the variables when I access them throught he flatMap funciton.更让我困惑的是,我为结构化流定义的 SQL 类型与我通过flatMap函数访问变量时的实际类型之间似乎存在某种隐藏映射(至少对我来说是隐藏的)。 I looked at the docs, and this was indeed the case.我查看了文档,情况确实如此。 According to the documentation:根据文档:

Returns the value at position i.返回位置 i 的值。 If the value is null, null is returned.如果值为 null,则返回 null。 The following is a mapping between Spark SQL types and return types:以下是 Spark SQL 类型和返回类型之间的映射:

BooleanType -> java.lang.Boolean ByteType -> java.lang.Byte BooleanType -> java.lang.Boolean ByteType -> java.lang.Byte
ShortType -> java.lang.Short IntegerType -> java.lang.Integer ShortType -> java.lang.Short IntegerType -> java.lang.Integer
FloatType -> java.lang.Float DoubleType -> java.lang.Double FloatType -> java.lang.Float DoubleType -> java.lang.Double
StringType -> String DecimalType -> java.math.BigDecimal StringType -> String DecimalType -> java.math.BigDecimal

DateType -> java.sql.Date TimestampType -> java.sql.Timestamp DateType -> java.sql.Date TimestampType -> java.sql.Timestamp

BinaryType -> byte array ArrayType -> scala.collection.Seq (use getList for java.util.List) MapType -> scala.collection.Map (use getJavaMap for java.util.Map) StructType -> org.apache.spark.sql.Row BinaryType -> 字节数组 ArrayType -> scala.collection.Seq(对 java.util.List 使用 getList) MapType -> scala.collection.Map(对 java.util.Map 使用 getJavaMap) StructType -> org.apache.spark。 sql.Row

Given that, I would have expected that this mapping would have been baked into the Row class more formally as an interface that it implements, but apparently that is not the case :( It seems that in the case of the TimestampType/java.sql.Timestamp, I have to abandon my timestamp type for something else? Someone please explain why I'm wrong! I've only been using scala and spark for 3-4 months now.鉴于此,我本来希望这个映射会更正式地作为它实现的接口被烘焙到Row类中,但显然情况并非如此:(似乎在 TimestampType/java.sql 的情况下。时间戳,我必须放弃我的时间戳类型以换取其他东西?有人请解释为什么我错了!我现在只使用了 3-4 个月的 Scala 和 Spark。

-Paul -保罗

You have correctly deduced that the Scala type of a TimestampType column is java.sql.Timestamp .您已正确推断TimestampType列的 Scala 类型是java.sql.Timestamp

As of V1.5.0 .V1.5.0 开始 org.apache.spark.sql.Row has a getTimestamp(i: Int) method, so you can call it and get a java.sql.Timestamp : org.apache.spark.sql.Row 一个getTimestamp(i: Int)方法,所以你可以调用它并获得一个java.sql.Timestamp

val timestamp = row.getTimestamp(1)

Even if you use earlier versions, there's no need to abandon this type, you can simply use the getAs[T](i: Int) with java.sql.Timestamp :即使您使用早期版本,也没有必要放弃这种类型,您可以简单地将getAs[T](i: Int)java.sql.Timestamp

val timestamp = row.getAs[java.sql.Timestamp](2)
// OR:
val timestamp = row.getAs[java.sql.Timestamp]("timestamp")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM