简体   繁体   English

Spark UDF类型不匹配错误

[英]Spark UDF type mismatch error

I'm trying to write a UDF to convert a timestamp into an integer representing the hour of the week. 我正在尝试编写一个UDF来将时间戳转换为表示一周中小时的整数。 I'm easily able to accomplish this with SparkSql like this. 我很容易用这样的SparkSql来完成这个。

在此输入图像描述

I have many UDFs in our code with this exact syntax but this one is trying a type mismatch error. 我的代码中有很多UDF,这个语法确切,但是这个尝试了类型不匹配错误。 I also tried invoking my UDF with col("session_ts_start") but that also failed to work. 我也尝试用col("session_ts_start")调用我的UDF,但也无法工作。

import spark.implicits._
import java.sql.Timestamp
import org.apache.spark.sql.functions._

def getHourOfWeek() = udf(
    (ts: Timestamp) => unix_timestamp(ts)
)

val dDF = df.withColumn("hour", getHourOfWeek()(df("session_ts_start")))
dDF.show()

<console>:154: error: type mismatch;
 found   : java.sql.Timestamp
 required: org.apache.spark.sql.Column
           (ts: Timestamp) => unix_timestamp(ts)

unix_timestamp is a SQL function. unix_timestamp是一个SQL函数。 It operates on Columns not external values: Columns运行而不是外部值:

def unix_timestamp(s: Column): Column 

and it cannot be used in UDF. 它不能在UDF中使用。

I'm trying (...) to convert a timestamp into an integer representing the hour of the week 我正在尝试(...)将时间戳转换为表示一周中小时的整数

import org.apache.spark.sql.Column
import org.apache.spark.sql.functions.{date_format, hour}

def getHourOfWeek(c: Column) =
  // https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html
  (date_format(c, "u").cast("integer") - 1) * 24 + hour(c)

val df = Seq("2017-03-07 01:00:00").toDF("ts").select($"ts".cast("timestamp"))

df.select(getHourOfWeek($"ts").alias("hour")).show
+----+
|hour|
+----+
|  25|
+----+

Another possible solution: 另一种可能的方案

import org.apache.spark.sql.functions.{next_day, date_sub}

def getHourOfWeek2(c: Column) = ((
  c.cast("bigint") - 
  date_sub(next_day(c, "Mon"), 7).cast("timestamp").cast("bigint")
) / 3600).cast("int")

df.select(getHourOfWeek2($"ts").alias("hour"))
+----+
|hour|
+----+
|  25|
+----+

Note : Neither solution handles daylight saving time or other date / time subtleties. 注意 :这两种解决方案都不能处理夏令时或其他日期/时间细微差别。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM