简体   繁体   中英

Hive external table with parquet data not selecting data

I have Hive external table with parquet data. There is no compression being utilized here. I am writing data (parquet files) to the HDFS directory using a spark job. But when I try to select data from table, I get below error/warning and output doesn't appear. I am sure that this is a common problem. Please let me know how can I overcome this?

Hive - 1.2.1000.2.5.0.0-1245 hdp - 2.5.0.0-1245 spark version 1.6.2

Jun 1, 2017 5:04:27 PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because created_by could not be parsed (see PARQUET-251): parquet-mr version 1.6.0 org.apache.parquet.VersionParser$VersionParseException: Could not parse created_by: parquet-mr version 1.6.0 using format: (.+) version ((. ) )?(build ?(. )) at org.apache.parquet.VersionParser.parse(VersionParser.java:112) at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60) at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:263)

It seems that because the parquet writer version used in spark job and the one used to read in Hive is different and there is a gap between them. Here we can see in Hive the version being used to read is parquet-mr version 1.6.0

Now, If anybody can tell me how can I change the version of parquet writer used in spark job OR how to change Hive parquet reader version, I can try that to resolve this problem.

您看到的异常是无害的。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM