简体   繁体   English

带有已删除分区文件的分区表上的Spark SQL查询失败

[英]Spark sql queries on partitioned table with removed partitions files fails

Below is what am trying in order, 以下是按顺序尝试的内容,

  1. create partitioned table in hive based on current hour. 根据当前时间在蜂巢中创建分区表。
  2. use spark hive context and perform msck repair table. 使用spark配置单元上下文并执行msck修复表。
  3. delete the hdfs folders of one of the added partitions manually. 手动删除添加的分区之一的hdfs文件夹。
  4. use spark hive context again and perform a> msck repair this does not remove the partition added already with no hdfs folder. 再次使用spark hive上下文并执行> msck修复,这不会删除已添加的没有hdfs文件夹的分区。 seems like known behavior with respect to "msck repair" b> select * from tablexxx where (existing partition); 似乎是关于“ msck修复”的已知行为b>从tablexxx其中选择*(现有分区); Fails with exception : Filenotfound exception pointing to hdfs folder which was deleted manually. 失败失败:Filenotfound异常指向手动删除的hdfs文件夹。

Any insights on this behavior would be of great help. 对这种行为的任何见解都会有很大的帮助。

Yes, MSCK REPAIR TABLE will only discover new partitions, not delete "old" ones. 是的, MSCK REPAIR TABLE将仅发现新分区,而不删除“旧”分区。

Working with external hive tables where you deleted the HDFS folder, I see two solutions 使用删除了HDFS文件夹的外部配置单元表,我看到了两种解决方案

  1. drop the table (files will not be deleted because the table is external), then re-create the table using the same location, and then run MSCK REPAIR TABLE . 删除表(由于表是外部文件,因此不会删除文件),然后使用相同的位置重新创建表,然后运行MSCK REPAIR TABLE This is my prefered solution. 这是我的首选解决方案。
  2. Drop all the partitions you deleted using ALTER TABLE <table> DROP PARTITION <partition> 使用ALTER TABLE <table> DROP PARTITION <partition>删除所有删除的ALTER TABLE <table> DROP PARTITION <partition>

What you observe in your case is maybe related to these: https://issues.apache.org/jira/browse/SPARK-15044 and https://issues.apache.org/jira/browse/SPARK-19187 您在案例中观察到的内容可能与以下内容有关: https : //issues.apache.org/jira/browse/SPARK-15044https://issues.apache.org/jira/browse/SPARK-19187

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM