Apache Spark function to_timestamp() 无法在 Databricks 上使用 PySpark

Question

I'm getting NULL output when I execute the code to_timestamp()当我执行代码 to_timestamp() 时，我得到 NULL output

The code that I'm executing is as follows:我正在执行的代码如下：

.withColumn("LAST_MODIFICATION_DT", to_timestamp(concat(col('LAST_MOD_DATE'), lit(' '), col('LAST_MOD_TIME')), 'yyy-MM-dd HH:mm:ss'))

The schema for the fields LAST_MOD_DATE & LAST_MOD_TIME is as follows: LAST_MOD_DATE 和 LAST_MOD_TIME 字段的架构如下：

I'm getting the output 'NULL' for the column 'LAST_MODIFICATION_DT'我得到列“LAST_MODIFICATION_DT”的 output“NULL”

Any thoughts?有什么想法吗？

Answer 1

In Spark SQL concat doesn't convert null to '' ;在 Spark SQL 中， concat不会将 null 转换为'' ； any null argument will cascade into a null result.任何 null 参数都会级联成 null 结果。 It's often easier to write these kind of expressions in python and register them as UDFs, eg在 python 中编写这些类型的表达式并将它们注册为 UDF 通常更容易，例如

from pyspark.sql.types import StringType

def concat2_(s1, s2) -> str:
  return str(s1) + ' ' + str(s2)

concat2 = spark.udf.register("concat2", concat2_, StringType())

Then you can use it in Spark queries in built in python,然后你可以在 python 内置的 Spark 查询中使用它，

from pyspark.sql.functions import col
df = spark.sql('select 1 a, 2 b').withColumn("c",concat2(col('a'),col('b')))
display(df)

or SQL或 SQL

%sql
with q as
(select 1 a, 2 b)
select a,b,concat2(a,b) c 
from q

Apache Spark function to_timestamp() 无法在 Databricks 上使用 PySpark

问题描述

1 个解决方案

解决方案1
1 2022-03-06 14:35:42

Apache Spark function to_timestamp() 无法在 Databricks 上使用 PySpark

问题描述

1 个解决方案

解决方案1 1 2022-03-06 14:35:42

解决方案1
1 2022-03-06 14:35:42