简体繁体中英

Hive external table with parquet data not selecting data

原文 2017-06-01 06:48:57 0 1 hadoop/ apache-spark/ hive/ parquet

I have Hive external table with parquet data. There is no compression being utilized here. I am writing data (parquet files) to the HDFS directory using a spark job. But when I try to select data from table, I get below error/warning and output doesn't appear. I am sure that this is a common problem. Please let me know how can I overcome this?

Hive - 1.2.1000.2.5.0.0-1245 hdp - 2.5.0.0-1245 spark version 1.6.2

Jun 1, 2017 5:04:27 PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because created_by could not be parsed (see PARQUET-251): parquet-mr version 1.6.0 org.apache.parquet.VersionParser$VersionParseException: Could not parse created_by: parquet-mr version 1.6.0 using format: (.+) version ((. ) )?(build ?(. )) at org.apache.parquet.VersionParser.parse(VersionParser.java:112) at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60) at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:263)

It seems that because the parquet writer version used in spark job and the one used to read in Hive is different and there is a gap between them. Here we can see in Hive the version being used to read is parquet-mr version 1.6.0

Now, If anybody can tell me how can I change the version of parquet writer used in spark job OR how to change Hive parquet reader version, I can try that to resolve this problem.

1 answers

您看到的异常是无害的。

How to insert data into Parquet table in Hive

Is it possible to compress Parquet file which contain Json data in hive external table?

Add data into Hive external table

In Hive how to through error if external table partition location data is missing while selecting data?

Hive external table to parquet with binary columns

Unable to read data from Parquet Hive table via Spark 1.6

Index Hive table data in Parquet format to Cloudera Search/ Solr

Lunch TDCH to Load to load data from Hive parquet table to Teradata

Unable to create a parquet hive table with a column data type 'STRING'

Table created with "stored as Parquet" option using PySpark SQL or Hive does not actually store data files in Parquet format

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to insert data into Parquet table in Hive Is it possible to compress Parquet file which contain Json data in hive external table? Add data into Hive external table In Hive how to through error if external table partition location data is missing while selecting data? Hive external table to parquet with binary columns Unable to read data from Parquet Hive table via Spark 1.6 Index Hive table data in Parquet format to Cloudera Search/ Solr Lunch TDCH to Load to load data from Hive parquet table to Teradata Unable to create a parquet hive table with a column data type 'STRING' Table created with "stored as Parquet" option using PySpark SQL or Hive does not actually store data files in Parquet format

Related Tags

Hive external table with parquet data not selecting data

Question

1 answers

solution1 -1 2017-10-05 18:46:41

solution1
-1 2017-10-05 18:46:41