简体   繁体   中英

Spark-Sql returns 0 records without repairing hive table

I'm doing the following:

  1. Delete hive partition using ALTER TABLE ... DROP IF EXISTS PARTITION (col='val1')
  2. hdfs dfs -rm -r path_to_remove
  3. Run ingestion program that creates this partition (col='val1') and creates avro files under the HDFS folder`
  4. sqlContext.sql("select count(0) from table1 where col='val1'").show returns 0 until MSCK REPAIR TABLE .

Is it compulsory to do the repair step to see the data again in spark-sql ? Please advise.

If it's an external table, yes, you need to repair the table. I don't think you need to do that with managed tables.

SparkSQL reads information from the Hive metastore, and without having information about the partition there, nothing can be counted, by Spark or any other tool that uses the metastore

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM