簡體   English   中英

spark sql 使用 spark expr 更改日期格式

[英]spark sql change date format using spark expr

我使用 pyspark 2.4 和下面的代碼:我有一個帶有法國月份的 dataframe,我將它們轉換為英文月份以更改格式日期( date_desired 列),使用兩個表達式一切正常

data = [
        (1,"20 mai 2021"),
        (1,"21 juin 2021")

      ]

schema = StructType([
        StructField('montant', IntegerType(), False),
        StructField('date', StringType(),True),

        ])


col = ["montant","date"]
df2 = spark.createDataFrame(data=data, schema= schema)
df2= df2.select(col)


df2.show()

dd =df2.withColumn('date_expr',F.expr(" CASE WHEN rlike(date,'mai')  THEN regexp_replace(date,'mai','may') \
                                     WHEN rlike(date,'juin') THEN regexp_replace(date,'juin','june') \
                                     ELSE date  \
                                     END as rr\
                                     "))

dd =dd.withColumn('date_desired',F.expr(" to_date(date_expr ,'dd MMMM yyyy') "))

dd.show()



+-------+------------+
|montant|        date|
+-------+------------+
|      1| 20 mai 2021|
|      1|21 juin 2021|
+-------+------------+

+-------+------------+------------+------------+
|montant|        date|   date_expr|date_desired|
+-------+------------+------------+------------+
|      1| 20 mai 2021| 20 may 2021|  2021-05-20|
|      1|21 juin 2021|21 june 2021|  2021-06-21|
+-------+------------+------------+------------+


但是〜:我想用一個表達式來實現相同的結果,如下所示:

dd =df2.withColumn('date_expr',F.expr(" CASE WHEN rlike(date,'mai')  THEN regexp_replace(date,'mai','may') \
                                     WHEN rlike(date,'juin') THEN regexp_replace(date,'juin','june') \
                                     ELSE date  \
                                     END as dt_col\
                                     to_date(dt_col ,'dd MMMM yyyy')"))

但我收到錯誤 sql 語法

from itertools import chain
#create map using itertolls
d={'mai': "May", 'juin': "June"}

m_expr1 = create_map([lit(x) for x in chain(*d.items())])

new = (df2.withColumn('new_date', split(df2['date'],'\s')).withColumn('x', F.struct(*[F.col("new_date")[i].alias(f"val{i+1}") for i in range(3)]))#convert date intostruct column
       .withColumn("x", F.col("x").withField("val2", m_expr1[F.col("x.val2")]))#Map new dates
       .select('montant','date',array_join(array('x.*'),' ').alias('newdate'))#Convert struct column to string date
       .withColumn('date_desired',F.expr(" to_date(newdate ,'dd MMMM yyyy') "))#convert to datetime
      ).show()


+-------+------------+------------+------------+
|montant|        date|     newdate|date_desired|
+-------+------------+------------+------------+
|      1| 20 mai 2021| 20 May 2021|  2021-05-20|
|      1|21 juin 2021|21 June 2021|  2021-06-21|
+-------+------------+------------+------------+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM