简体繁体 English

主节点在数据节点上运行是否正确？

[英]Is it correct that master runs on a datanode?

原文 2018-08-30 16:36:50 6 2 hadoop/ yarn/ giraph

I'm using giraph-1.3 built with yarn profile. 我正在使用带有纱线轮廓的giraph-1.3。 For starting I configured 1 namenode and 2 datanodes on a ec2 cluster. 首先，我在ec2集群上配置了1个namenode和2个datanode。 My application properly works because I see expected output in logs (and in output directory). 我的应用程序正常运行，因为我在日志（和输出目录）中看到了预期的输出。 I launched giraph with "-w 2" argument because I have two datanodes. 我使用“ -w 2”参数启动了giraph，因为我有两个datanode。

In userlogs of datanode1 I found log of first worker. 在datanode1的用户日志中，我找到了第一个工作程序的日志。
in userlogs of datanode2 I found log of second worker and log of master too. 在datanode2的用户日志中，我也找到了第二个工作日志和主日志。

I expected to find log of master in the namenode ie I expected that master runs on namenode. 我希望在namenode中找到master的日志，即我希望master在namenode上运行。 Is it right? 这样对吗？

Maybe I have to configure another datanode and then I will find master logs on this new datanode? 也许我必须配置另一个数据节点，然后才能在这个新的数据节点上找到主日志？

2 个解决方案

I understood that hadoop/giraph works creating containers on datanodes. 我了解hadoop / giraph可以在datanode上创建容器。 Hadoop creates a container for application master, then giraph creates a container for the master. Hadoop为应用程序主数据库创建容器，然后giraph为主数据库创建容器。 Furthermore giraph creates a number of container for workers corresponding to -w parameter. 此外，giraph为与-w参数相对应的工作人员创建了许多容器。

YARN always creates an Application Master for every job. YARN始终为每个作业创建一个应用程序主控。

You can start as many "workers" as you want, depending on your workload, but since you only have 2 datanodes, you can only have 2 NodeManagers for maximum parallelism 您可以根据需要启动任意数量的“工作人员”，具体取决于您的工作量，但是由于您只有2个数据节点，因此只能有2个NodeManager以实现最大并行度

A NodeManager has a maximum memory space available to it, and the YARN containers for the tasks of a job get a subsection of that in order to do processing. NodeManager拥有最大的可用存储空间，并且用于任务任务的YARN容器会获得该子分区的一部分以便进行处理。