[英]Spark sql queries on partitioned table with removed partitions files fails
Below is what am trying in order, 以下是按顺序尝试的内容,
Any insights on this behavior would be of great help. 对这种行为的任何见解都会有很大的帮助。
Yes, MSCK REPAIR TABLE
will only discover new partitions, not delete "old" ones. 是的, MSCK REPAIR TABLE
将仅发现新分区,而不删除“旧”分区。
Working with external hive tables where you deleted the HDFS folder, I see two solutions 使用删除了HDFS文件夹的外部配置单元表,我看到了两种解决方案
MSCK REPAIR TABLE
. 删除表(由于表是外部文件,因此不会删除文件),然后使用相同的位置重新创建表,然后运行MSCK REPAIR TABLE
。 This is my prefered solution. 这是我的首选解决方案。 ALTER TABLE <table> DROP PARTITION <partition>
使用ALTER TABLE <table> DROP PARTITION <partition>
删除所有删除的ALTER TABLE <table> DROP PARTITION <partition>
What you observe in your case is maybe related to these: https://issues.apache.org/jira/browse/SPARK-15044 and https://issues.apache.org/jira/browse/SPARK-19187 您在案例中观察到的内容可能与以下内容有关: https : //issues.apache.org/jira/browse/SPARK-15044和https://issues.apache.org/jira/browse/SPARK-19187
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.