[英]Convert string to date using pyspark
我有一個pyspark數據框,其字符串列格式為YYYYMMDD,我試圖將其轉換為日期列(我應該有一個最終日期ISO 8061)。 該字段命名為截止日期,格式如下:
deadline
20190530
我嘗試了以下解決方案:
from pyspark.sql.functions import unix_timestamp, col
from pyspark.sql.types import TimestampType
from pyspark.sql.types import StringType
from pyspark.sql.functions import from_unixtime
from pyspark.sql.types import DateType
df.select(to_date(df.deadline).alias('dt')).show()
df.withColumn('new_date',to_date(unix_timestamp(df.deadline, 'YYYYMMDD').cast('timestamp'))).show()
orders_concat.select(unix_timestamp(orders_concat.deadline, 'YYYYMMDD')).show()
df.select(unix_timestamp(df.ts_string, 'yyyy/MM/dd HH:mm:ss').cast(TimestampType()).alias("timestamp")).show()
df.select(unix_timestamp(df.deadline, 'yyyy/MM/dd HH:mm:ss').cast(TimestampType()).alias("timestamp")).show()
df.select(to_date(cast(unix_timestamp('deadline', 'YYYYMMDD').alias('timestamp').show()
ndf = df.withColumn('_1', df['deadline'].cast(DateType()))
df2 = df.select('deadline', from_unixtime(unix_timestamp('deadline', 'YYYYMMDD')).alias('date'))
我總是得到空值。
有人有建議嗎?
使用正確的格式yyyyMMdd
,它工作正常:
from pyspark.sql import functions as F
df.withColumn('new_date',F.to_date(F.unix_timestamp(df.deadline, 'yyyyMMdd').cast('timestamp'))).show()
+--------+----------+
|deadline| new_date|
+--------+----------+
|20190530|2019-05-30|
+--------+----------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.