无法使用 PySpark 与 Databricks 上的 apache spark function to_timestamp() 连接并添加一列

Question

I'm trying to using concatenate with the to_timestamp() on a Apache Spark table and add a columns using the.withColumn function but it won't work.我正在尝试在 Apache Spark 表上使用与 to_timestamp() 的连接，并使用 the.withColumn function 添加列，但它不起作用。

The code is as follows:代码如下：

DIM_WORK_ORDER.withColumn("LAST_MODIFICATION_DT", to_timestamp(concat(col('LAST_MOD_DATE'), lit(' '), col('LAST_MOD_TIME')), 'yyyyMMdd HHmmss'))

The result I would expect to see is something like我希望看到的结果是这样的

LAST_MODIFICATION_DT | LAST_MODIFICATION_DT | WORK_ORDER工作指示

However, I'm getting the following result:但是，我得到以下结果：

Some data to work with:要使用的一些数据：

WORK_ORDER LAST_MOD_TIME 10000008 null 11358186 142254 10000007 193402 10000009 null WORK_ORDER LAST_MOD_TIME 10000008 null 11358186 142254 10000007 193402 10000009 null

Any thoughts?有什么想法吗？

Answer 1

As far as I know in Spark, dataframes are immutable.据我所知，在 Spark 中，数据帧是不可变的。 Hence, once you have created a dataframe, it can't change.因此，一旦您创建了 dataframe，它就无法更改。

%python
import pyspark
from pyspark.sql.functions import *
df = spark.read.option("header","true").csv("<input file path>")
df1 = df.withColumn("LAST_MODIFICATION_DT", to_timestamp(concat(col('LAST_MOD_DATE'), lit(' '), col('LAST_MOD_TIME')), 'yyyyMMdd HHmmss'))
display(df1)

I am getting below output as expected.正如预期的那样，我低于 output。 If this is not what you expect, please provide more info如果这不是您所期望的，请提供更多信息

在此处输入图像描述

无法使用 PySpark 与 Databricks 上的 apache spark function to_timestamp() 连接并添加一列

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-03-13 18:06:35

无法使用 PySpark 与 Databricks 上的 apache spark function to_timestamp() 连接并添加一列

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-03-13 18:06:35

解决方案1
1 已采纳 2022-03-13 18:06:35