简体   繁体   English

如何为大数据集群分配物理资源?

[英]How to allocate physical resources for a big data cluster?

I have three servers and I want to deploy Spark Standalone Cluster or Spark on Yarn Cluster on that servers. 我有三台服务器,我想在这些服务器上部署Spark Standalone Cluster或Spark on Yarn Cluster。 Now I have some questions about how to allocate physical resources for a big data cluster. 现在我有一些关于如何为大数据集群分配物理资源的问题。 For example, i want to know whether i can deploy Spark Master Process and Spark Worker Process on the same node. 例如,我想知道我是否可以在同一节点上部署Spark Master Process和Spark Worker Process。 Why? 为什么?

Server Details: 服务器细节:

CPU Cores: 24
Memory: 128GB

I need your help. 我需要你的帮助。 Thanks. 谢谢。

Of course you can, just put host with Master in slaves. 当然你可以,只需将主人与主人放在奴隶中。 On my test server I have such configuration, master machine is also worker node and there is one worker-only node. 在我的测试服务器上,我有这样的配置,主机也是工作节点,并且有一个仅限工作的节点。 Everything is ok 一切都好

However be aware, that is worker will fail and cause major problem (ie system restart), then you will have problem, because also master will be afected. 但是请注意,那就是工作人员会失败并导致重大问题(即系统重启),那么你就会遇到问题,因为也会受到影响。

Edit: Some more info after question edit :) If you are using YARN (as suggested), you can use Dynamic Resource Allocation. 编辑:问题编辑后的更多信息:)如果您使用YARN(如建议的那样),您可以使用动态资源分配。 Here are some slides about it and here article from MapR. 下面是一些关于它的幻灯片, 这里文章从MAPR。 It a very long topic how to configure memory properly for given case, I think that these resources will give you much knowledge about it 这是一个很长的主题,如何为给定的情况正确配置内存,我认为这些资源将为您提供很多相关知识

BTW. BTW。 If you have already intalled Hadoop Cluster, maybe try YARN mode ;) But it's out of topic of question 如果您已经安装了Hadoop集群,可以尝试YARN模式;)但这不是问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM