简体   繁体   中英

Athena query error HIVE_BAD_DATA: Not valid Parquet file . csv / .metadata

I'm creating an app that works with AWS Athena on compressed Parquet (SNAPPY) data. It works almost fine, however, after every query execution, 2 files get uploaded to the S3_OUTPUT_BUCKET of type csv and metadata . (as it should) These 2 files break the execution of the next query. I get the following error:

HIVE_BAD_DATA: Not valid Parquet file: s3://MY_OUTPUT_BUCKET/logs/QUERY_NAME/2022/08/07/tables/894a1d10-0c1d-4de1-9e61-13b2b0f79e40.metadata expected magic number: PAR1 got: HP

I need to manually delete those files for the next query to work. Any suggestions on how to make this work? (I know I cannot exclude those files with a regex etc.. but I don't want to delete the files manually for the app to work)

I read everything about the output files but it didn't help. ( Working with query results, recent queries, and output files )

Any help is appreciated.

While setting up Athena for execution, we need to specify where the metadata and csv from the query execution are written into. This needs to be written into a different folder than the table location.

Go to Athena Query Editor > Settings > Manage and edit Query Result Location to be another S3 bucket than the table or a different folder within the same bucket.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM