简体   繁体   English

hadoop集群中的边缘节点

[英]Edge nodes in hadoop cluster

Can Some one explain me the architecture of Edge node in hadoop. 有人可以解释一下hadoop中Edge节点的体系结构吗? I am able to find only the definition on the internet, I have the following queries - 我只能在互联网上找到定义,我有以下查询-

1) Does the edge node have to be part of the cluster (What advantages do we have if it is inside the cluster?). 1)边缘节点是否必须是群集的一部分(如果它在群集中,我们有什么优势?)。 Does it store any blocks of data in hdfs. 它是否在hdfs中存储任何数据块。

2) Can the edge node be outside the cluster? 2)边缘节点可以在群集之外吗?

+1 with the Dell explanation. +1与戴尔的说明。 In my opinion, edge nodes in a Hadoop cluster are typically nodes that are responsible for running the client-side operations of a Hadoop cluster. 我认为,Hadoop集群中的边缘节点通常是负责运行Hadoop集群的客户端操作的节点。 Typically edge-nodes are kept separate from the nodes that contain Hadoop services such as HDFS, MapReduce, etc, mainly to keep computing resources separate. 通常,边缘节点与包含Hadoop服务(例如HDFS,MapReduce等)的节点保持独立,主要是为了保持计算资源独立。 For smaller clusters only having a few nodes, it's common to see nodes playing a hybrid combination of roles for master services (JT, NN, etc.) , slave services (TT, DN, etc) and gateway services. 对于只有几个节点的小型集群,通常会看到节点扮演主服务(JT,NN等),从属服务(TT,DN等)和网关服务的角色的混合组合。

Note that running master and slave Hadoop services on the same node is not an ideal setup, and can cause scaling and resource issues depending on what's at use. 请注意,在同一节点上运行主Hadoop服务和从属Hadoop服务不是理想的设置,并且会根据使用的资源而导致扩展和资源问题。 This kind of configuration is typically seen on a small-scale dev environment. 在小型开发环境中通常会看到这种配置。

With that said, here's some answers to your questions posted: 话虽如此,这是您发布的问题的一些答案:

1) Does the edge node have to be part of the cluster? 1)边缘节点是否必须是群集的一部分?

The edge node does not have to be part of the cluster, however if it is outside of the cluster (meaning it doesn't have any specific Hadoop service roles running on it), it will need some basic pieces such as Hadoop binaries and current Hadoop cluster config files to submit jobs on the cluster. 边缘节点不必是群集的一部分,但是,如果边缘节点不在群集中(这意味着它没有运行任何特定的Hadoop服务角色),则它将需要一些基本组件,例如Hadoop二进制文件和最新版本Hadoop集群配置文件以提交集群上的作业。

2) What advantages do we have if it is inside the cluster? 2)如果它在集群中,我们有什么优势?

Depending on which distribution is in use, edge nodes run within the cluster allow for centralized management of all the Hadoop configuration entries on the cluster nodes which helps to reduce the amount of administration needed to update the config files. 根据使用的分发版本,在群集内运行的边缘节点可以对群集节点上的所有Hadoop配置条目进行集中管理,这有助于减少更新配置文件所需的管理量。 Usually this is a one-to-many approach, where config entries are updated in one location and are pushed out to all (many) nodes in the cluster. 通常这是一对多的方法,其中配置条目在一个位置中更新,并被推送到集群中的所有(许多)节点。

However, when one of the nodes within the cluster is also used as an edge node, there are CPU and memory resources that are consumed by the client operations which detracts the available resources that could be utilized by the running Hadoop services in that node. 但是,当群集中的节点之一也用作边缘节点时,客户端操作会消耗CPU和内存资源,这会分散该节点中正在运行的Hadoop服务可能利用的可用资源。

3) Does it store any blocks of data in hdfs? 3)它是否在hdfs中存储任何数据块?

Unless the edge node is configured with a DataNode service, blocks of data will not be stored on that node. 除非边缘节点配置有DataNode服务,否则数据块将不会存储在该节点上。

4) Should the edge node be outside the cluster? 4)边缘节点是否应位于群集之外?

As mentioned above, it can be dependent on the cluster environment and use-case; 如上所述,它可以取决于集群环境和用例; One of the supporting reasons to configure it outside of the cluster is to keep the client-running and Hadoop services separated. 在群集外部配置它的支持原因之一是使客户端运行和Hadoop服务保持分离。

Keeping an edge node separate allows that node to utilize the full computing resources available for Hadoop processing. 将边缘节点分隔开可以使该节点利用可用于Hadoop处理的全部计算资源。

Hope this helps! 希望这可以帮助!

Edgenodes are not a common Hadoop term. 边缘节点不是Hadoop的常用术语。 I expect you have found the same definition I did which should answer your questions....This is from Dell. 我希望您已经找到了与我相同的定义,应该可以回答您的问题。

EdgeNode – The EdgeNode is the access point for the external applications, tools, and users that need to utilize the Hadoop environment. EdgeNode – EdgeNode是需要使用Hadoop环境的外部应用程序,工具和用户的访问点。 The EdgeNode sits between the Hadoop cluster and the corporate network to provide access control, policy enforcement, logging, and gateway services to the Hadoop environment. EdgeNode位于Hadoop群集和公司网络之间,以提供对Hadoop环境的访问控制,策略实施,日志记录和网关服务。 A typical Hadoop environment will have a minimum of one EdgeNode and more based on performance needs. 根据性能需求,典型的Hadoop环境将至少具有一个EdgeNode,而更多。


So it is really up to you. 因此,这完全取决于您。 The Edgenode might be in the cluster, or maybe not. Edgenode可能在群集中,也可能不在群集中。 It might run Hadoop software, or merely be able to access it. 它可能运行Hadoop软件,或者仅能够访问它。 You do not fundamentally need one as far as I can see - it is just the name given for the ways you can access the cluster. 据我所知,从根本上讲,您并不需要一个-它只是为访问群集提供的名称。

边缘节点不过是hadoop集群的关守,它允许您访问hadoop应用程序,例如hive,pig ..而是我要说的是与集群进行对话的客户端。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM