簡體   English   中英

從 spark scala 中的結構數組中刪除元素

[英]Remove an elemet from an array of struct in spark scala

我想實現一個功能以從 spark scala 中的結構數組中刪除一個元素。對於日期“2019-01-26”,我想從數組列中刪除整個結構。 以下是我的代碼:

  import org.apache.spark.sql.types._
    
    val df=Seq(("123","Jack",Seq(("2020-04-26","200","72","ABC"),("2020-05-26","300","71","ABC"),("2019-01-26","200","70","DEF"),("2019-01-26","200","70","DEF"),("2019-01-26","200","70","DEF"))),("124","jones",Seq(("2020-04-26","200","72","ABC"),("2020-05-26","300","71","ABC"),("2020-06-26","200","70","ABC"),("2020-08-26","300","69","ABC"),("2020-08-26","300","69","ABC"))),("125","daniel",Seq(("2019-01-26","200","70","DEF"),("2019-01-26","200","70","DEF"),("2019-01-26","200","70","DEF"),("2019-01-26","200","70","DEF"),("2019-01-26","200","70","DEF")))).toDF("id","name","history").withColumn("history",$"history".cast("array<struct<infodate:Date,amount1:Integer,amount2:Integer,detail:string>>"))
    
    scala> df.printSchema
    root
     |-- id: string (nullable = true)
     |-- name: string (nullable = true)
     |-- history: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- infodate: date (nullable = true)
     |    |    |-- amount1: integer (nullable = true)
     |    |    |-- amount2: integer (nullable = true)
     |    |    |-- detail: string (nullable = true)

在此處輸入圖像描述

因此,對於日期 2019-01-26 ,我想刪除它所在的結構,以便將其從數組列中刪除。我想要這樣的解決方案。

在此處輸入圖像描述

我設法找到了解決方案,但它涉及大量硬編碼,我正在尋找最佳的解決方案/建議。

硬編碼解決方案:

val dfnew=df
 .withColumn( "history" , 
  array_except(
   col("history"),
   array(
    struct(
     lit("2019-01-26").cast(DataTypes.DateType).alias("infodate"),
     lit("200").cast(DataTypes.IntegerType).alias("amount1"), 
     lit("70").cast(DataTypes.IntegerType).alias("amount2"), 
     lit("DEF").alias("detail")
    )
   )
  )
 )

有沒有什么方法可以僅在日期“2019-01-26”使用一個過濾條件來優化它,這會從數組列中刪除結構/數組。

我在這里使用表達式/過濾器。 顯然它是一個字符串,因此您可以將日期替換為一個值,從而減少硬編碼。 過濾器是方便的表達式,因為它們允許您使用 SQL 表示法來引用結構的子組件。

scala> :paste
// Entering paste mode (ctrl-D to finish)

df.withColumn( "history" , 
  expr( "filter( history , x -> x.infodate != '2019-01-26' )" )
 ).show(10,false)

// Exiting paste mode, now interpreting.

+---+------+--------------------------------------------------------------------------------------------------------------------------------------------+
|id |name  |history                                                                                                                                     |
+---+------+--------------------------------------------------------------------------------------------------------------------------------------------+
|123|Jack  |[[2020-04-26, 200, 72, ABC], [2020-05-26, 300, 71, ABC]]                                                                                    |
|124|jones |[[2020-04-26, 200, 72, ABC], [2020-05-26, 300, 71, ABC], [2020-06-26, 200, 70, ABC], [2020-08-26, 300, 69, ABC], [2020-08-26, 300, 69, ABC]]|
|125|daniel|[]                                                                                                                                          |
+---+------+--------------------------------------------------------------------------------------------------------------------------------------------+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM