简体繁体 English

在hadoop集群中，所有节点都应该安装hive吗？安装猪

[英]In a hadoop cluster, should hive be installed on all nodes? Install Pig

原文 2018-06-11 15:53:45 4 2 hadoop/ apache-pig

I am new to Hadoop / Pig and I have just started reading the docs.我是 Hadoop / Pig 的新手，我刚刚开始阅读文档。
There are lots of blogs on installing Hadoop in cluster mode.有很多关于在集群模式下安装 Hadoop 的博客。
I know that Pig runs on top of Hadoop.我知道 Pig 运行在 Hadoop 之上。

My question is: Hadoop is installed on all the cluster nodes.我的问题是：Hadoop 安装在所有集群节点上。
Should I also install Pig on all the cluster nodes or only on the master node?我应该在所有集群节点上还是只在主节点上安装 Pig？

2 个解决方案

You would want to install Hive Metastore and Hive Server on 2 different nodes.您可能希望在 2 个不同的节点上安装 Hive Metastore 和 Hive Server。 By default, hive uses derby database, but most of the people choose to go with MySQL so there will be a MYSQL server daemon also.默认情况下，hive 使用 derby 数据库，但大多数人选择使用 MySQL，因此也会有一个 MYSQL 服务器守护程序。 So not to confuse you anymore :所以不要再混淆你了：

Install HiveServer and WebHcat Server on one node在一个节点上安装 HiveServer 和 WebHcat Server
Install Hive Metastore and MySQL server on another node.在另一个节点上安装 Hive Metastore 和 MySQL 服务器。

This is the best practice.这是最佳做法。 If you have any other doubt you can ask!如有其他疑问可以追问！

I cannot tell if the question is about Hive or Pig, but there's a difference between clients and servers.我不知道问题是关于 Hive 还是 Pig，但是客户端和服务器之间是有区别的。

For Hive, the master services are the Metastore and HiveServer2.对于 Hive，主服务是 Metastore 和 HiveServer2。 You can install these daemons on the same server to improve network traffic between the metastore and the Hive query compiler.您可以将这些守护进程安装在同一台服务器上，以改善 Metastore 和 Hive 查询编译器之间的网络流量。 You only need one client to communicate with those masters.您只需要一个客户端即可与这些大师进行交流。

For Pig, it communicates directly to YARN and HDFS (optionally Hive, if you use Hcatalog).对于 Pig，它直接与 YARN 和 HDFS 通信（如果您使用 Hcatalog，则可以选择 Hive）。 Again, it's only a client, so only one hosts needs it.同样，它只是一个客户端，所以只有一个主机需要它。

It is generally preferred to have a dedicated set of machines for Hive and the backing RDBMS for the metastore (Mysql or Postgres being the more popular options)通常首选为 Hive 提供一组专用机器，并为 Metastore 提供支持 RDBMS（Mysql 或 Postgres 是更受欢迎的选项）

You also don't need to "install Pig in the cluster".您也不需要“在集群中安装 Pig”。 For example, I could grab the Hadoop XML configs and run some Pig code against the YARN cluster from any outside computer after downloading Pig locally (same applies to Spark)例如，在本地下载 Pig 后，我可以从任何外部计算机获取 Hadoop XML 配置并针对 YARN 集群运行一些 Pig 代码（同样适用于 Spark）