[英]Tricky upsert in Delta table using spark
我有一個案例 class 如下:
case class UniqueException(
ExceptionId:String,
LastUpdateTime:Timestamp,
IsDisplayed:Boolean,
Message:String,
ExceptionType:String,
ExceptionMessage:String,
FullException:String
)
這用於生成增量表。
需要滿足以下條件:
我寫了下面的代碼,但是不滿足上面的情況。
val dfUniqueException = DeltaTable.forPath(outputFolder)
dfUniqueException.as("existing")
.merge(dfNewExceptions.as("new"), "new.ExceptionId = existing.ExceptionId and new.LastUpdateTime > date_add(existing.LastUpdateTime, 14")
.whenMatched().updateAll()
.whenNotMatched().insertAll()
.execute()
知道如何使用單個合並語句來滿足上述條件嗎?
實際上你的規則可以重寫如下:
LastUpdateTime
差異超過 14 天,更新現有因此,您可以更改代碼以將“14 天規則”放在whenMatched
子句中而不是merge
子句中,如下所示:
import io.delta.tables.DeltaTable
val dfUniqueException = DeltaTable.forPath(outputFolder)
val dfNewExceptionLabeled = dfNewExceptions.as("new")
dfUniqueException.as("existing")
.merge(dfNewExceptionLabeled, "new.ExceptionId = existing.ExceptionId")
.whenMatched("new.LastUpdateTime > date_add(existing.LastUpdateTime, 14)")
.updateAll()
.whenNotMatched()
.insertAll()
.execute()
如果您將此代碼應用於以下現有異常:
+---------------+-------------------+--------+
|ExceptionId |LastUpdateTime |Message |
+---------------+-------------------+--------+
|exception_id_01|2021-03-10 00:00:00|value_01|
|exception_id_02|2021-03-10 00:00:00|value_02|
|exception_id_03|2021-03-10 00:00:00|value_03|
+---------------+-------------------+--------+
以及以下新例外:
+---------------+-------------------+--------+
|ExceptionId |LastUpdateTime |Message |
+---------------+-------------------+--------+
|exception_id_02|2021-03-20 00:00:00|value_04|
|exception_id_03|2021-03-31 00:00:00|value_05|
|exception_id_04|2021-03-31 00:00:00|value_06|
+---------------+-------------------+--------+
您在增量表中的最終結果是:
+---------------+-------------------+--------+
|ExceptionId |LastUpdateTime |Message |
+---------------+-------------------+--------+
|exception_id_04|2021-03-31 00:00:00|value_06|
|exception_id_01|2021-03-10 00:00:00|value_01|
|exception_id_03|2021-03-31 00:00:00|value_05|
|exception_id_02|2021-03-10 00:00:00|value_02|
+---------------+-------------------+--------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.