简体   繁体   English

如何修复 AWS S3 上损坏的 delta 湖表

[英]How to fix corrupted delta lake table on AWS S3

I ended up manually deleting some delta lake entries(hosted on S3).我最终手动删除了一些 delta Lake 条目(托管在 S3 上)。 Now my spark job is failing because the delta transaction logs point to files that do not exist in the file system.现在我的 Spark 作业失败了,因为增量事务日志指向文件系统中不存在的文件。 I came across this https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-fsck.html but I am not sure how should I run this utility in my case.我遇到了这个https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-fsck.html但我不确定我应该如何运行这个实用程序。

You could easily do that following the document that you have attached.您可以按照随附的文档轻松完成此操作。

I have done that as below if you have hive table on top of your S3:如果您的 S3 顶部有 hive 表,我已按如下方式完成:

%sql
FSCK REPAIR TABLE schema.testtable DRY RUN

Using DRY RUN will list the files that needs to be deleted.使用DRY RUN将列出需要删除的文件。 You can first run the above command and verify the files that actually need to be deleted.您可以先运行上述命令,并验证实际需要删除的文件。

Once you have verified that you can run the actual above command without DRY RUN and it should do what you needed.一旦你确认你可以在没有DRY RUN的情况下运行上面的实际命令,它应该可以满足你的需要。

%sql
FSCK REPAIR TABLE schema.testtable

Now if you have not created a hive table and have a path(delta table) where you have files than you can do it as below:现在,如果您还没有创建 hive 表并且有一个路径(增量表),您可以在其中保存文件,那么您可以执行以下操作:

%sql
FSCK REPAIR TABLE delta.`dbfs:/mnt/S3bucket/tables/testtable` DRY RUN

I am doing this from databricks and have mounted my S3 bucket path to databricks.我正在从数据块执行此操作,并将我的 S3 存储桶路径安装到数据块。 you need to make sure that you have that ` symbol after delta.您需要确保在delta之后有那个`符号。 and before the actual path otherwise it wont work.在实际路径之前,否则它将不起作用。

here also in order to perform the actual repair operation you can remove the DRY RUN from the above command and it should do the stuff that you wat.在这里,为了执行实际的修复操作,您可以从上述命令中删除DRY RUN ,它应该执行您所希望的操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM