I'm doing the following:
ALTER TABLE ... DROP IF EXISTS PARTITION (col='val1')
hdfs dfs -rm -r path_to_remove
(col='val1')
and creates avro files under the HDFS folder` sqlContext.sql("select count(0) from table1 where col='val1'").show
returns 0 until MSCK REPAIR TABLE
. Is it compulsory to do the repair step to see the data again in spark-sql
? Please advise.
If it's an external table, yes, you need to repair the table. I don't think you need to do that with managed tables.
SparkSQL reads information from the Hive metastore, and without having information about the partition there, nothing can be counted, by Spark or any other tool that uses the metastore
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.