簡體   English   中英

如何向增量表中的 updateExpr 添加復雜邏輯

[英]How to add complex logic to updateExpr in a Delta Table

我正在用一些增量記錄更新一個增量表。 其中兩個字段只需要一個簡單的更新,但還有一個是地圖的集合,我想連接所有現有值而不是進行更新/替換

val historicalDF = Seq(
    (1, 0, "Roger", Seq(Map("score" -> 5, "year" -> 2012)))
).toDF("id", "ts", "user", "scores")

historicalDF.write
  .format("delta")
  .mode("overwrite")
  .save(table_path)

val hist_dt : DeltaTable = DeltaTable.forPath(spark, table_path)
    
val incrementalDF = Seq(
    (1, 1, "Roger Rabbit", Seq(Map("score" -> 7, "year" -> 2013)))
).toDF("id", "ts", "user", "scores")   

合並后我想要的東西是這樣的:

+---+---+------------+--------------------------------------------------------+
|id |ts |user        |scores                                                  |
+---+---+------------+--------------------------------------------------------+
|1  |1  |Roger Rabbit|[{score -> 7, year -> 2013}, {score -> 7, year -> 2013}]|
+---+---+------------+--------------------------------------------------------+

我試圖執行此連接的是:

hist_dt
  .as("ex")
  .merge(incrementalDF.as("in"),
         "ex.id = in.id")
  .whenMatched
  .updateExpr(
    Map(
    "ts" -> "in.ts",
    "user" -> "in.user",
    "scores" -> "in.scores" ++ "ex.scores"
    )
  )
  .whenNotMatched
  .insertAll()
  .execute()

但是"in.scores""ex.scores"列被解釋為String ,所以我收到以下錯誤:

 error: value ++ is not a member of (String, String)

如果有辦法向updateExpr添加一些復雜的邏輯?

使用update()而不是updateExpr()讓我將所需的列傳遞給 udf,因此我可以在那里添加更復雜的邏輯

def join_seq_map(incremental: Seq[Map[String,Integer]], existing: Seq[Map[String,Integer]]) : Seq[Map[String,Integer]] = {
    (incremental, existing) match {
        case ( null , null) => null
        case ( null, e ) => e
        case ( i , null) => i
        case ( i , e ) => (i ++ e).distinct
    } 
}

def join_seq_map_udf = udf(join_seq_map _)

hist_dt
  .as("ex")
  .merge(
    incrementalDF.as("in"),
    "ex.id = in.id")
   .whenMatched("ex.ts < in.ts")
   .update(Map(
    "ts" -> col("in.ts"),
    "user" -> col("in.user"),
    "scores" -> join_seq_map_udf(col("in.scores"), col("ex.scores"))
   ))
  .whenNotMatched
  .insertAll()
  .execute()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM