简体   繁体   English

配置单元查询性能不好

[英]hive query performance is bad

I am joining 3 huge tables (billion row tables) in HIVE. 我正在加入3个HIVE大表(十亿行表)。 All the statistics are collected, but still the performance is very bad (query taking 40 minutes odd). 收集了所有统计数据,但性能仍然很差(查询时间为40分钟)。

Is there any parameter which I can set in the HIVE prompt to get better performance? 我可以在HIVE提示中设置任何参数以获得更好的性能吗?

When I am trying execution I am seeing info like 当我尝试执行时,我看到的信息就像

Sep 4, 2015 7:40:23 AM INFO: parquet.hadoop.ParquetInputFormat: Total input paths to process : 1
Sep 4, 2015 7:40:23 AM INFO: parquet.hadoop.ParquetFileReader: reading another 1 footers

All the tables are created in BigSql with storage parameter as "STORED AS PARQUETFILE" 所有表都在BigSql中创建,存储参数为“STORED AS PARQUETFILE”

How can I suppress the job progress details when a HIVE query is running? 如何在运行HIVE查询时抑制作业进度详细信息?

Regarding HIVE version 关于HIVE版本

 hive> set system:sun.java.command; system:sun.java.command=org.apache.hadoop.util.RunJar /opt/ibm/biginsights/hive/lib/hive-cli-0.12.0.jar org.apache.hadoop.hive.cli.CliDriver -hiveconf hive.aux.jars.path=file:///opt/ibm/biginsights/hive/lib/hive-hbase-handler-0.12.0.jar,file:///opt/ibm/biginsights/hive/lib/hive-contrib-0.12.0.jar,file:///opt/ibm/biginsights/hive/lib/hbase-client-0.96.0.jar,file:///opt/ibm/biginsights/hive/lib/hbase-common-0.96.0.jar,file:///opt/ibm/biginsights/hive/lib/hbase-hadoop2-compat-0.96.0.jar,file:///opt/ibm/biginsights/hive/lib/hbase-prefix-tree-0.96.0.jar,file:///opt/ibm/biginsights/hive/lib/hbase-protocol-0.96.0.jar,file:///opt/ibm/biginsights/hive/lib/hbase-server-0.96.0.jar,file:///opt/ibm/biginsights/hive/lib/htrace-core-2.01.jar,file:///opt/ibm/biginsights/hive/lib/zookeeper-3.4.5.jar,file:///opt/ibm/biginsights/sheets/libext/piggybank.jar,file:///opt/ibm/biginsights/sheets/libext/pig-0.11.1.jar,file:///opt/ibm/biginsights/sheets/libext/avro-1.7.4.jar,file:///opt/ibm/biginsights/sheets/libext/opencsv-1.8.jar,file:///opt/ibm/biginsights/sheets/libext/json-simple-1.1.jar,file:///opt/ibm/biginsights/sheets/libext/joda-time-1.6.jar,file:///opt/ibm/biginsights/sheets/libext/bigsheets.jar,file:///opt/ibm/biginsights/sheets/libext/bigsheets-serdes-1.0.0.jar,file:///opt/ibm/biginsights/lib/parquet/parquet-mr/parquet-column-1.3.2.jar,file:///opt/ibm/biginsights/lib/parquet/parquet-mr/parquet-common-1.3.2.jar,file:///opt/ibm/biginsights/lib/parquet/parquet-mr/parquet-encoding-1.3.2.jar,file:///opt/ibm/biginsights/lib/parquet/parquet-mr/parquet-generator-1.3.2.jar,file:///opt/ibm/biginsights/lib/parquet/parquet-mr/parquet-hadoop-bundle-1.3.2.jar,file:///opt/ibm/biginsights/lib/parquet/parquet-mr/parquet-hive-bundle-1.3.2.jar,file:///opt/ibm/biginsights/lib/parquet/parquet-mr/parquet-thrift-1.3.2.jar,file:///opt/ibm/biginsights/hive/lib/guava-11.0.2.jar 

Koushik - This question I asked a month back will give you a good insight to performance of ORC vs Parquet. Koushik - 我在一个月前回答的这个问题将让你对ORC对Parquet的表现有一个很好的了解。

Let me ask this question! 让我问这个问题! What is the structure of your data? 您的数据结构是什么? Is this nested or flatter? 这是嵌套还是奉承? If this is a flatter data, example can be data ingested from an RDBMS, ORC is better since it has light indexes stored alongside the data and makes data retrieval faster. 如果这是一个更平坦的数据,例如可以是从RDBMS摄取的数据,ORC更好,因为它与数据一起存储了光索引,并使数据检索更快。

Hope this helps 希望这可以帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM