[英]Unable to concatenate with apache spark function to_timestamp() on Databricks using PySpark and add a column
I'm trying to using concatenate with the to_timestamp() on a Apache Spark table and add a columns using the.withColumn function but it won't work.我正在尝试在 Apache Spark 表上使用与 to_timestamp() 的连接,并使用 the.withColumn function 添加列,但它不起作用。
The code is as follows:代码如下:
DIM_WORK_ORDER.withColumn("LAST_MODIFICATION_DT", to_timestamp(concat(col('LAST_MOD_DATE'), lit(' '), col('LAST_MOD_TIME')), 'yyyyMMdd HHmmss'))
The result I would expect to see is something like我希望看到的结果是这样的
LAST_MODIFICATION_DT | LAST_MODIFICATION_DT | WORK_ORDER
工作指示
However, I'm getting the following result:但是,我得到以下结果:
Some data to work with:要使用的一些数据:
WORK_ORDER LAST_MOD_TIME 10000008 null 11358186 142254 10000007 193402 10000009 null WORK_ORDER LAST_MOD_TIME 10000008 null 11358186 142254 10000007 193402 10000009 null
Any thoughts?有什么想法吗?
As far as I know in Spark, dataframes are immutable.据我所知,在 Spark 中,数据帧是不可变的。 Hence, once you have created a dataframe, it can't change.
因此,一旦您创建了 dataframe,它就无法更改。
%python
import pyspark
from pyspark.sql.functions import *
df = spark.read.option("header","true").csv("<input file path>")
df1 = df.withColumn("LAST_MODIFICATION_DT", to_timestamp(concat(col('LAST_MOD_DATE'), lit(' '), col('LAST_MOD_TIME')), 'yyyyMMdd HHmmss'))
display(df1)
I am getting below output as expected.正如预期的那样,我低于 output。 If this is not what you expect, please provide more info
如果这不是您所期望的,请提供更多信息
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.