[英]Remove an elemet from an array of struct in spark scala
我想實現一個功能以從 spark scala 中的結構數組中刪除一個元素。對於日期“2019-01-26”,我想從數組列中刪除整個結構。 以下是我的代碼:
import org.apache.spark.sql.types._
val df=Seq(("123","Jack",Seq(("2020-04-26","200","72","ABC"),("2020-05-26","300","71","ABC"),("2019-01-26","200","70","DEF"),("2019-01-26","200","70","DEF"),("2019-01-26","200","70","DEF"))),("124","jones",Seq(("2020-04-26","200","72","ABC"),("2020-05-26","300","71","ABC"),("2020-06-26","200","70","ABC"),("2020-08-26","300","69","ABC"),("2020-08-26","300","69","ABC"))),("125","daniel",Seq(("2019-01-26","200","70","DEF"),("2019-01-26","200","70","DEF"),("2019-01-26","200","70","DEF"),("2019-01-26","200","70","DEF"),("2019-01-26","200","70","DEF")))).toDF("id","name","history").withColumn("history",$"history".cast("array<struct<infodate:Date,amount1:Integer,amount2:Integer,detail:string>>"))
scala> df.printSchema
root
|-- id: string (nullable = true)
|-- name: string (nullable = true)
|-- history: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- infodate: date (nullable = true)
| | |-- amount1: integer (nullable = true)
| | |-- amount2: integer (nullable = true)
| | |-- detail: string (nullable = true)
因此,對於日期 2019-01-26 ,我想刪除它所在的結構,以便將其從數組列中刪除。我想要這樣的解決方案。
我設法找到了解決方案,但它涉及大量硬編碼,我正在尋找最佳的解決方案/建議。
硬編碼解決方案:
val dfnew=df
.withColumn( "history" ,
array_except(
col("history"),
array(
struct(
lit("2019-01-26").cast(DataTypes.DateType).alias("infodate"),
lit("200").cast(DataTypes.IntegerType).alias("amount1"),
lit("70").cast(DataTypes.IntegerType).alias("amount2"),
lit("DEF").alias("detail")
)
)
)
)
有沒有什么方法可以僅在日期“2019-01-26”使用一個過濾條件來優化它,這會從數組列中刪除結構/數組。
我在這里使用表達式/過濾器。 顯然它是一個字符串,因此您可以將日期替換為一個值,從而減少硬編碼。 過濾器是方便的表達式,因為它們允許您使用 SQL 表示法來引用結構的子組件。
scala> :paste
// Entering paste mode (ctrl-D to finish)
df.withColumn( "history" ,
expr( "filter( history , x -> x.infodate != '2019-01-26' )" )
).show(10,false)
// Exiting paste mode, now interpreting.
+---+------+--------------------------------------------------------------------------------------------------------------------------------------------+
|id |name |history |
+---+------+--------------------------------------------------------------------------------------------------------------------------------------------+
|123|Jack |[[2020-04-26, 200, 72, ABC], [2020-05-26, 300, 71, ABC]] |
|124|jones |[[2020-04-26, 200, 72, ABC], [2020-05-26, 300, 71, ABC], [2020-06-26, 200, 70, ABC], [2020-08-26, 300, 69, ABC], [2020-08-26, 300, 69, ABC]]|
|125|daniel|[] |
+---+------+--------------------------------------------------------------------------------------------------------------------------------------------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.