[英]Can we delete latest version of delta table in the delta lake?
I have a delta table with 4 versions.我有一个包含 4 个版本的增量表。
DESCRIBE HISTORY cfm ---> has 4 versions.描述历史 cfm ---> 有 4 个版本。 0,1,2,3. 0,1,2,3。
I want to delete version 3 or 2. How can I achieve this?我想删除版本 3 或 2。我该如何实现?
i tried我试过
from delta.tables import *
from pyspark.sql.functions import *
deltaTable = DeltaTable.forPath(spark, "path of cfm files")
deltaTable.delete("'version' = '3'")
This does not delete the version 3. https://docs.delta.io/0.4.0/delta-update.html says这不会删除版本3。https://docs.delta.io/0.4.0/delta-update.html说
"delete removes the data from the latest version of the Delta table but does not remove it from the physical storage until the old versions are explicitly vacuumed" “删除从最新版本的 Delta 表中删除数据,但在明确清除旧版本之前不会将其从物理存储中删除”
If i have to run vacuum command how to use them on latest dates and not older dates.如果我必须运行 vacuum 命令如何在最新日期而不是旧日期使用它们。
You need to use the vaccum command to perform this operation.您需要使用 vaccum 命令来执行此操作。 However the default retention for vaccum is for 7 days and it will error out if you are trying to vaccum anything within 7 days.但是,vaccum 的默认保留时间为 7 天,如果您在 7 天内尝试 vaccum 任何内容,它将出错。
We can work around this by setting a spark configuration that will bypass the default retention period check.我们可以通过设置绕过默认保留期检查的 spark 配置来解决此问题。
solution below:解决方案如下:
from delta.tables import *
spark.conf.set("spark.databricks.delta.retentionDurationCheck.enabled", "false")
deltaTable = DeltaTable.forPath(spark, deltaPath)
deltaTable.vacuum(24)
*deltaPath -- is the path for your delta table *deltaPath -- 是增量表的路径
*24 -- indicates the number of hours up until which your versioning is retained, any versions which were created beyond 24 hours in the past would get deleted. *24 -- 表示保留版本控制的小时数,过去 24 小时后创建的任何版本都将被删除。
Hope this answers your question.希望这能回答您的问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.