简体   繁体   English

Hive查询效率

[英]Hive Query Efficiency

Could you help me with a Hive Query Efficiency problem? 你能帮我解决一下Hive Query Efficiency问题吗? I have two queries working for the same problem. 我有两个查询正在解决同一个问题。 I just cannot figure out why one is much faster than the other. 我只是想不通为什么一个比另一个快得多。 If you know please feel free to provide insight. 如果您知道,请随时提供见解。 Any info is welcomed! 欢迎任何信息!

Problem : I am trying to check the minimum value of a bunch of variables in a Hive parquet table. 问题 :我正在尝试检查Hive镶木桌中的一堆变量的最小值。

Queries : I tried two two queries as follows: 查询 :我尝试了两个查询,如下所示:

query 1

drop table if exists tb_1 purge;
create table if not exists tb_1 as
select 'v1' as name, min(v1) as min_value from src_tb union all
select 'v2' as name, min(v2) as min_value from src_tb union all
select 'v3' as name, min(v3) as min_value from src_tb union all
...
select 'v200' as name, min(v200) as min_value from src_tb
;

query 2

drop table if exists tb_2 purge;
create table if not exists tb_2 as
select min(v1) as min_v1
, min(v2) as min_v2
, min(v3) as min_v3
...
, min(v200) as min_v200
from src_tb
;

Result : Query 2 is much faster than query 1. It took probably 5 mins to finish the second query. 结果 :查询2比查询1快得多。大约需要5分钟才能完成第二个查询。 I don't know how long will query 1 take. 我不知道查询1会花多长时间。 But after I submit the first query, it took a long time to even react to the query, by which I mean that usually after I submit a query, the system will start to analyze and provides some compiling information in the terminal. 但是在我提交第一个查询之后,甚至花了很长时间才对查询作出反应,我的意思是通常在我提交查询后,系统将开始分析并在终端中提供一些编译信息。 However, for my first query, after my submission, the system won't even react to this. 但是,对于我的第一个查询,在我提交之后,系统甚至不会对此作出反应。 So I just killed it. 所以我就把它杀死了。

What do you think? 你怎么看? Thank you in advance. 先感谢您。

Query execution time depends on environment that you execute it. 查询执行时间取决于您执行它的环境。

In MSSQL . 在MSSQL中

Some people like you think query execution is similar to algorithm that they see in some theoretical resources, but in practical situation, it depends on other things. 有些人喜欢你认为查询执行类似于他们在一些理论资源中看到的算法,但在实际情况中,它依赖于其他事情。

For example both of your queries have SELECT statement that perform on a table and at first glance, they need to read all rows, but database server must analyze the statement to determine the most efficient way to extract the requested data. 例如,您的两个查询都具有在表上执行的SELECT语句,乍一看,它们需要读取所有行,但数据库服务器必须分析该语句以确定提取所请求数据的最有效方法。 This is referred to as optimizing the SELECT statement. 这称为优化SELECT语句。 The component that does this is called the Query Optimizer . 执行此操作的组件称为查询优化程序 The input to the Query Optimizer consists of the query, the database schema (table and index definitions), and the database statistics. 查询优化器的输入包括查询,数据库模式(表和索引定义)以及数据库统计信息。 The output of the Query Optimizer is a query execution plan , sometimes referred to as a query plan or just a plan. 查询优化器的输出是查询执行计划 ,有时称为查询计划或仅计划。 (Please see this for more information about query-processing architecture) (请参阅有关查询处理体系结构的详细信息)

You can see execution plan in MSSQL by reading this article and I think you will understand better by seeing execution plan for both of your queries. 通过阅读本文 ,您可以在MSSQL中看到执行计划,我认为通过查看两个查询的执行计划,您将更好地理解。

Edit (Hive) 编辑(Hive)

Hive provides an EXPLAIN command that shows the execution plan for a query. Hive提供EXPLAIN命令,显示查询的执行计划。 The syntax for this statement is as follows: 该语句的语法如下:

EXPLAIN [EXTENDED|DEPENDENCY|AUTHORIZATION] query

A Hive query gets converted into a sequence of stages. Hive查询将转换为一系列阶段。 The description of the stages itself shows a sequence of operators with the metadata associated with the operators. 阶段本身的描述显示了一系列运算符,其中包含与运算符关联的元数据。

Please see LanguageManual Explain for more information. 有关更多信息,请参阅LanguageManual Explain

What is surprising? 有什么令人惊讶的? The first query has to read src_tb a total of 200 times. 第一个查询必须读取src_tb共200次。 The second reads the data once and performs 200 aggregations. 第二个读取数据并执行200次聚合。 It is a no brainer that it is faster. 它更快更难以理解。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM