简体   繁体   中英

How to do arithmetic operations when doing delta table updates?

I have a delta table old , I want to merge it with new . In the new table, there are some id values which are also present in old table. I want to update cons values for the overlapping id s by summing up the old and new table cons values. How to do that?

Try this:

In delta table updates, you can do arithmetic operations just like what you would have done when creating any new spark column.

import pyspark.sql.functions as F
from delta.tables import *

spark.createDataFrame([{"id":i, "cons":1, "cons2":1} for i in range(500)])\
.write.format("delta").mode("overwrite").option("overwriteSchema", "true")\
.save("dbfs:/FileStore/anmol/sample_events_croma_before")

new = spark.createDataFrame([{"id":i, "cons":1, "cons2":1} for i in range(450, 550)])

old = DeltaTable.forPath(spark, "dbfs:/FileStore/anmol/sample_events_croma_before")

old.alias('old')\
.merge(new.alias('new')\
       , "old.id = new.id")\
.whenMatchedUpdate(set={
  "id": "new.id",
  "cons": "old.cons + new.cons",
  "cons2": F.col("old.cons2") + F.col("new.cons2"),
})\
.whenNotMatchedInsert(values={
  "id": "new.id",
  "cons": "new.cons",
})\
.execute()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM