[英]Apache Spark function to_timestamp() not working with PySpark on Databricks
I'm getting NULL output when I execute the code to_timestamp()当我执行代码 to_timestamp() 时,我得到 NULL output
The code that I'm executing is as follows:我正在执行的代码如下:
.withColumn("LAST_MODIFICATION_DT", to_timestamp(concat(col('LAST_MOD_DATE'), lit(' '), col('LAST_MOD_TIME')), 'yyy-MM-dd HH:mm:ss'))
The schema for the fields LAST_MOD_DATE & LAST_MOD_TIME is as follows: LAST_MOD_DATE 和 LAST_MOD_TIME 字段的架构如下:
I'm getting the output 'NULL' for the column 'LAST_MODIFICATION_DT'我得到列“LAST_MODIFICATION_DT”的 output“NULL”
Any thoughts?有什么想法吗?
In Spark SQL concat
doesn't convert null to ''
;在 Spark SQL 中,
concat
不会将 null 转换为''
; any null argument will cascade into a null result.任何 null 参数都会级联成 null 结果。 It's often easier to write these kind of expressions in python and register them as UDFs, eg
在 python 中编写这些类型的表达式并将它们注册为 UDF 通常更容易,例如
from pyspark.sql.types import StringType
def concat2_(s1, s2) -> str:
return str(s1) + ' ' + str(s2)
concat2 = spark.udf.register("concat2", concat2_, StringType())
Then you can use it in Spark queries in built in python,然后你可以在 python 内置的 Spark 查询中使用它,
from pyspark.sql.functions import col
df = spark.sql('select 1 a, 2 b').withColumn("c",concat2(col('a'),col('b')))
display(df)
or SQL或 SQL
%sql
with q as
(select 1 a, 2 b)
select a,b,concat2(a,b) c
from q
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.