简体   繁体   中英

How to add complex logic to updateExpr in a Delta Table

I am Updating a Delta Table with some incremental records. Two of the fields require just a plain update, but there is another one which is a collection of maps which I would like to concatenate all the existing values instead of doing a update/replace

val historicalDF = Seq(
    (1, 0, "Roger", Seq(Map("score" -> 5, "year" -> 2012)))
).toDF("id", "ts", "user", "scores")

historicalDF.write
  .format("delta")
  .mode("overwrite")
  .save(table_path)

val hist_dt : DeltaTable = DeltaTable.forPath(spark, table_path)
    
val incrementalDF = Seq(
    (1, 1, "Roger Rabbit", Seq(Map("score" -> 7, "year" -> 2013)))
).toDF("id", "ts", "user", "scores")   

What I would like to have after the merge something is like this:

+---+---+------------+--------------------------------------------------------+
|id |ts |user        |scores                                                  |
+---+---+------------+--------------------------------------------------------+
|1  |1  |Roger Rabbit|[{score -> 7, year -> 2013}, {score -> 7, year -> 2013}]|
+---+---+------------+--------------------------------------------------------+

What I tried to perform this concatenation is:

hist_dt
  .as("ex")
  .merge(incrementalDF.as("in"),
         "ex.id = in.id")
  .whenMatched
  .updateExpr(
    Map(
    "ts" -> "in.ts",
    "user" -> "in.user",
    "scores" -> "in.scores" ++ "ex.scores"
    )
  )
  .whenNotMatched
  .insertAll()
  .execute()

But the columns "in.scores" and "ex.scores" are interpreted as String , so I am getting the following error:

 error: value ++ is not a member of (String, String)

If there a way to add some complex logic to updateExpr ?

Using update() instead of updateExpr() let me pass the required columns to a udf, so I can add there a more complex logic

def join_seq_map(incremental: Seq[Map[String,Integer]], existing: Seq[Map[String,Integer]]) : Seq[Map[String,Integer]] = {
    (incremental, existing) match {
        case ( null , null) => null
        case ( null, e ) => e
        case ( i , null) => i
        case ( i , e ) => (i ++ e).distinct
    } 
}

def join_seq_map_udf = udf(join_seq_map _)

hist_dt
  .as("ex")
  .merge(
    incrementalDF.as("in"),
    "ex.id = in.id")
   .whenMatched("ex.ts < in.ts")
   .update(Map(
    "ts" -> col("in.ts"),
    "user" -> col("in.user"),
    "scores" -> join_seq_map_udf(col("in.scores"), col("ex.scores"))
   ))
  .whenNotMatched
  .insertAll()
  .execute()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM