简体   繁体   English

在hadoop集群中,所有节点都应该安装hive吗? 安装猪

[英]In a hadoop cluster, should hive be installed on all nodes? Install Pig

I am new to Hadoop / Pig and I have just started reading the docs.我是 Hadoop / Pig 的新手,我刚刚开始阅读文档。
There are lots of blogs on installing Hadoop in cluster mode.有很多关于在集群模式下安装 Hadoop 的博客。
I know that Pig runs on top of Hadoop.我知道 Pig 运行在 Hadoop 之上。

My question is: Hadoop is installed on all the cluster nodes.我的问题是:Hadoop 安装在所有集群节点上。
Should I also install Pig on all the cluster nodes or only on the master node?我应该在所有集群节点上还是只在主节点上安装 Pig?

You would want to install Hive Metastore and Hive Server on 2 different nodes.您可能希望在 2 个不同的节点上安装 Hive Metastore 和 Hive Server。 By default, hive uses derby database, but most of the people choose to go with MySQL so there will be a MYSQL server daemon also.默认情况下,hive 使用 derby 数据库,但大多数人选择使用 MySQL,因此也会有一个 MYSQL 服务器守护程序。 So not to confuse you anymore :所以不要再混淆你了:

  1. Install HiveServer and WebHcat Server on one node在一个节点上安装 HiveServer 和 WebHcat Server
  2. Install Hive Metastore and MySQL server on another node.在另一个节点上安装 Hive Metastore 和 MySQL 服务器。

This is the best practice.这是最佳做法。 If you have any other doubt you can ask!如有其他疑问可以追问!

I cannot tell if the question is about Hive or Pig, but there's a difference between clients and servers.我不知道问题是关于 Hive 还是 Pig,但是客户端和服务器之间是有区别的。

For Hive, the master services are the Metastore and HiveServer2.对于 Hive,主服务是 Metastore 和 HiveServer2。 You can install these daemons on the same server to improve network traffic between the metastore and the Hive query compiler.您可以将这些守护进程安装在同一台服务器上,以改善 Metastore 和 Hive 查询编译器之间的网络流量。 You only need one client to communicate with those masters.您只需要一个客户端即可与这些大师进行交流。

For Pig, it communicates directly to YARN and HDFS (optionally Hive, if you use Hcatalog).对于 Pig,它直接与 YARN 和 HDFS 通信(如果您使用 Hcatalog,则可以选择 Hive)。 Again, it's only a client, so only one hosts needs it.同样,它只是一个客户端,所以只有一个主机需要它。

It is generally preferred to have a dedicated set of machines for Hive and the backing RDBMS for the metastore (Mysql or Postgres being the more popular options)通常首选为 Hive 提供一组专用机器,并为 Metastore 提供支持 RDBMS(Mysql 或 Postgres 是更受欢迎的选项)

You also don't need to "install Pig in the cluster".您也不需要“在集群中安装 Pig”。 For example, I could grab the Hadoop XML configs and run some Pig code against the YARN cluster from any outside computer after downloading Pig locally (same applies to Spark)例如,在本地下载 Pig 后,我可以从任何外部计算机获取 Hadoop XML 配置并针对 YARN 集群运行一些 Pig 代码(同样适用于 Spark)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在hadoop集群中,是否应该在所有节点上安装配置单元? - In a hadoop cluster, should hive be installed on all nodes? 是否应在单个hadoop集群内的所有hadoop节点上安装oozie? - Should oozie be installed on all the hadoop nodes inside a single hadoop cluster? 在 Hadoop 集群中,Hive LLAP 守护程序应该在数据节点上还是在专用节点上工作? - In a Hadoop cluster, should Hive LLAP daemons work on datanodes or on dedicated nodes? HIVE表是否在Hadoop集群的所有节点之间重复 - Does a HIVE table get duplicated across all nodes of a Hadoop cluster 是否需要在群集中的所有hadoop节点中安装MIT KDC以进行Kerberos身份验证? - Need to install MIT KDC in All hadoop Nodes in the cluster For Kerberos Authentication? 在内部,在安装的hadoop和hive集群中,发生了什么? - Internally, In a cluster of hadoop and hive installed, what is happening? 在以MapReduce模式启动Pig之前,我应该先启动Hadoop集群吗? - Should I start Hadoop cluster before launching Pig in MapReduce mode? EMR Hadoop并未利用所有群集节点 - EMR Hadoop does not utilize all cluster nodes Hadoop集群中的所有从站均应具有相同的配置 - All the slaves in the Hadoop cluster should be of the same configuration 将Apache Pig连接到Hadoop集群 - Connect Apache Pig To Hadoop Cluster
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM