[英]Manually Deleted data file from delta lake
I have manually deleted a data file from delta lake and now the below command is giving error我已经从 delta lake 中手动删除了一个数据文件,现在下面的命令出错了
mydf = spark.read.format('delta').load('/mnt/path/data')
display(mydf)
Error错误
A file referenced in the transaction log cannot be found. This occurs when data has been manually deleted from the file system rather than using the table `DELETE` statement. For more information, see https://docs.microsoft.com/azure/databricks/delta/delta-intro#frequently-asked-questions
i have tried restarting the cluster with no luck also tried the below我试过重新启动集群但没有运气也尝试了下面的
spark.conf.set("spark.sql.files.ignoreCorruptFiles", "true")
spark.conf.set("spark.databricks.io.cache.enabled", "false")
Any help on repairing the transaction log or fix the error有关修复事务日志或修复错误的任何帮助
as explained before you must use vacuum to remove files as manually deleting files does not lead to the delta transaction log being updated which is what spark uses to identify what files to read.如前所述,您必须使用 vacuum 删除文件,因为手动删除文件不会导致更新增量事务日志,这是 spark 用来识别要读取的文件的内容。
In your case you can also use the FSCK REPAIR TABLE command.在您的情况下,您还可以使用FSCK REPAIR TABLE命令。 as per the docs: "Removes the file entries from the transaction log of a Delta table that can no longer be found in the underlying file system. This can happen when these files have been manually deleted."根据文档:“从 Delta 表的事务日志中删除无法再在基础文件系统中找到的文件条目。手动删除这些文件时可能会发生这种情况。”
FSCK Command worked for me. FSCK 命令对我有用。 Thanks All谢谢大家
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.