简体繁体 English

有Hive TEZ的时候Hive的LLAP有什么用？

[英]What is the use of Hive's LLAP when there is Hive TEZ?

原文 2018-04-24 07:07:09 9 2 hadoop/ hive/ hdfs

In our project, we load the data from Greenplum database to HDFS (HIVE).在我们的项目中，我们将数据从 Greenplum 数据库加载到 HDFS（HIVE）。 Lately, I came to know that there is a new bundle with Hive2, 'LLAP'.最近，我了解到 Hive2 有一个新捆绑包“LLAP”。 I have been confused with the concept of LLAP.我一直对 LLAP 的概念感到困惑。 What is the exact use of LLAP ? LLAP 的确切用途是什么？ When we already have Hive's TEZ Engine, what is the use of LLAP ?当我们已经有了 Hive 的 TEZ Engine 时，LLAP 有什么用呢？ A developer in our project told me that we are using Hive LLAP to load the data into HDFS Hive tables.我们项目中的一位开发人员告诉我，我们正在使用 Hive LLAP 将数据加载到 HDFS Hive 表中。 Is it a good practice to use LLAP ?使用 LLAP 是一个好习惯吗？ If not, why is it not ?如果不是，为什么不是？

Could anyone give me some clarity on the above queries ?任何人都可以让我对上述查询有所了解吗？

2 个解决方案

https://cwiki.apache.org/confluence/display/Hive/LLAP is a good place to learn about Hive Live Long And Process (LLAP). https://cwiki.apache.org/confluence/display/Hive/LLAP是了解 Hive Live Long And Process (LLAP) 的好地方。

As the link says正如链接所说

LLAP works within existing, process-based Hive execution to preserve the scalability and versatility of Hive. LLAP 在现有的、基于流程的 Hive 执行中工作，以保持 Hive 的可扩展性和多功能性。 It does not replace the existing execution model but rather enhances it.它不会取代现有的执行模型，而是对其进行了增强。

and和

LLAP is not an execution engine (like MapReduce or Tez) LLAP 不是执行引擎（如 MapReduce 或 Tez）

Rather, it provides a long-lived daemon (hence the LL part of the acronym) to replace interactions with the DataNode, and this daemon also provides caching, pre-fetching, and some query processing.相反，它提供了一个长期存在的守护进程（因此是首字母缩略词的 LL 部分）来代替与 DataNode 的交互，并且这个守护进程还提供缓存、预取和一些查询处理。 This allows simple queries to be largely processed by the daemon itself, with more complex queries being performed in YARN containers as usual.这允许守护程序本身在很大程度上处理简单的查询，而像往常一样在 YARN 容器中执行更复杂的查询。

The link also shows how Tez AM can sit above all of this, and submit Hive tasks which operate via LLAP, which interacts with the DataNode as required.该链接还显示了 Tez AM 如何能够超越所有这些，并提交通过 LLAP 操作的 Hive 任务，后者根据需要与 DataNode 进行交互。 In the example, initial stages of the query are pushed into LLAP, but large shuffles are performed in separate containers.在示例中，查询的初始阶段被推送到 LLAP，但大型 shuffle 在单独的容器中执行。

LLAP nodes are additional layer of nodes ( One LLAP node for one Hadoop Data node) between Tez and Hadoop data node that can cache data and process some queries. LLAP 节点是 Tez 和 Hadoop 数据节点之间的附加节点层（一个 LLAP 节点对应一个 Hadoop 数据节点），可以缓存数据并处理一些查询。 Query execution is still scheduled and managed by Tez.查询执行仍由 Tez 安排和管理。

LLAP node have daemons that cache data which can accelerate queries if common data is accessed again and again. LLAP 节点具有缓存数据的守护进程，如果一次又一次访问公共数据，这些守护进程可以加速查询。

In short it boost performance, you will get very good performance for your queries using LLAP in hive.简而言之，它提高了性能，您将在 hive 中使用 LLAP 获得非常好的查询性能。 Hive can also work without LLAP as well but it can be slower. Hive 也可以在没有 LLAP 的情况下工作，但速度可能会更慢。