簡體   English   中英

選擇配置單元執行引擎

[英]Selecting the hive execution engine

在下面顯示的3個蜂巢執行引擎中,在Hadoop集群中工作時,更推薦使用其中一個。 什么是用例,當我們必須使用(理想選擇)時。

我嘗試了一個樣本大小為400M的查詢,引擎Tez為我提供了比其他2更快的輸出,該查詢的摘要包括分組和過濾。

set hive.execution.engine=spark;
set hive.execution.engine=tez;
set hive.execution.engine=mr;

我試圖通過查看查詢來找到答案,應該能夠確定特定引擎將比其他引擎更快地給出結果。

The benefits that Tez provides over MapReduce execution engine while using Hive are:
● Tez does not write data to the disk during the intermediary steps of a Hive query. Tez makes use of
Directed Acyclic Graphs and the data from an intermediary step is passed on to the next step in the
graph instead of being written to the disk like it is done when using the MapReduce engine.
Removal of these IO operations saves a lot of time when dealing with large amounts of data.
● Tez and YARN together enable you to use objects in a container across applications. If two
applications require the same object(say a data frame) and are running within the same container,
you need not create the same object, again and again, you can reuse it. This leads to better
management of resources and also helps improve the performance.

請在這里檢查有關火花發動機的信息

https://community.cloudera.com/t5/Support-Questions/Hive-execution-engine-set-to-Spark-is-recommended/mp/177906

如果要運行交互式查詢,則LLAP(實時和實時)引擎是合適的。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM