简体   繁体   English

spark sql 使用 spark expr 更改日期格式

[英]spark sql change date format using spark expr

im using pyspark 2.4 and bellow the code: i have a dataframe with french month, i converte them to english month in order to change the fomat date ( date_desired column) and everything is ok using two expresssion我使用 pyspark 2.4 和下面的代码:我有一个带有法国月份的 dataframe,我将它们转换为英文月份以更改格式日期( date_desired 列),使用两个表达式一切正常

data = [
        (1,"20 mai 2021"),
        (1,"21 juin 2021")

      ]

schema = StructType([
        StructField('montant', IntegerType(), False),
        StructField('date', StringType(),True),

        ])


col = ["montant","date"]
df2 = spark.createDataFrame(data=data, schema= schema)
df2= df2.select(col)


df2.show()

dd =df2.withColumn('date_expr',F.expr(" CASE WHEN rlike(date,'mai')  THEN regexp_replace(date,'mai','may') \
                                     WHEN rlike(date,'juin') THEN regexp_replace(date,'juin','june') \
                                     ELSE date  \
                                     END as rr\
                                     "))

dd =dd.withColumn('date_desired',F.expr(" to_date(date_expr ,'dd MMMM yyyy') "))

dd.show()



+-------+------------+
|montant|        date|
+-------+------------+
|      1| 20 mai 2021|
|      1|21 juin 2021|
+-------+------------+

+-------+------------+------------+------------+
|montant|        date|   date_expr|date_desired|
+-------+------------+------------+------------+
|      1| 20 mai 2021| 20 may 2021|  2021-05-20|
|      1|21 juin 2021|21 june 2021|  2021-06-21|
+-------+------------+------------+------------+


But ~: i want to acheive the same result with one expression as below:但是〜:我想用一个表达式来实现相同的结果,如下所示:

dd =df2.withColumn('date_expr',F.expr(" CASE WHEN rlike(date,'mai')  THEN regexp_replace(date,'mai','may') \
                                     WHEN rlike(date,'juin') THEN regexp_replace(date,'juin','june') \
                                     ELSE date  \
                                     END as dt_col\
                                     to_date(dt_col ,'dd MMMM yyyy')"))

but i got error sql syntax但我收到错误 sql 语法

from itertools import chain
#create map using itertolls
d={'mai': "May", 'juin': "June"}

m_expr1 = create_map([lit(x) for x in chain(*d.items())])

new = (df2.withColumn('new_date', split(df2['date'],'\s')).withColumn('x', F.struct(*[F.col("new_date")[i].alias(f"val{i+1}") for i in range(3)]))#convert date intostruct column
       .withColumn("x", F.col("x").withField("val2", m_expr1[F.col("x.val2")]))#Map new dates
       .select('montant','date',array_join(array('x.*'),' ').alias('newdate'))#Convert struct column to string date
       .withColumn('date_desired',F.expr(" to_date(newdate ,'dd MMMM yyyy') "))#convert to datetime
      ).show()


+-------+------------+------------+------------+
|montant|        date|     newdate|date_desired|
+-------+------------+------------+------------+
|      1| 20 mai 2021| 20 May 2021|  2021-05-20|
|      1|21 juin 2021|21 June 2021|  2021-06-21|
+-------+------------+------------+------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM