简体   繁体   English

主节点在数据节点上运行是否正确?

[英]Is it correct that master runs on a datanode?

I'm using giraph-1.3 built with yarn profile. 我正在使用带有纱线轮廓的giraph-1.3。 For starting I configured 1 namenode and 2 datanodes on a ec2 cluster. 首先,我在ec2集群上配置了1个namenode和2个datanode。 My application properly works because I see expected output in logs (and in output directory). 我的应用程序正常运行,因为我在日志(和输出目录)中看到了预期的输出。 I launched giraph with "-w 2" argument because I have two datanodes. 我使用“ -w 2”参数启动了giraph,因为我有两个datanode。

In userlogs of datanode1 I found log of first worker. 在datanode1的用户日志中,我找到了第一个工作程序的日志。
in userlogs of datanode2 I found log of second worker and log of master too. 在datanode2的用户日志中,我也找到了第二个工作日志和主日志。

I expected to find log of master in the namenode ie I expected that master runs on namenode. 我希望在namenode中找到master的日志,即我希望master在namenode上运行。 Is it right? 这样对吗?

Maybe I have to configure another datanode and then I will find master logs on this new datanode? 也许我必须配置另一个数据节点,然后才能在这个新的数据节点上找到主日志?

I understood that hadoop/giraph works creating containers on datanodes. 我了解hadoop / giraph可以在datanode上创建容器。 Hadoop creates a container for application master, then giraph creates a container for the master. Hadoop为应用程序主数据库创建容器,然后giraph为主数据库创建容器。 Furthermore giraph creates a number of container for workers corresponding to -w parameter. 此外,giraph为与-w参数相对应的工作人员创建了许多容器。

YARN always creates an Application Master for every job. YARN始终为每个作业创建一个应用程序主控。

You can start as many "workers" as you want, depending on your workload, but since you only have 2 datanodes, you can only have 2 NodeManagers for maximum parallelism 您可以根据需要启动任意数量的“工作人员”,具体取决于您的工作量,但是由于您只有2个数据节点,因此只能有2个NodeManager以实现最大并行度

A NodeManager has a maximum memory space available to it, and the YARN containers for the tasks of a job get a subsection of that in order to do processing. NodeManager拥有最大的可用存储空间,并且用于任务任务的YARN容器会获得该子分区的一部分以便进行处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM