简体繁体 English

Apache Spark。 Spark客户端的高可用性

[英]Apache Spark. High availability for spark client

原文 2015-01-09 14:20:30 0 1 bigdata/ apache-spark/ high-availability

I am struggling to find any guides or manuals re high availability practices for client-side of spark applications. 我正在努力寻找有关Spark应用程序客户端的高可用性实践的任何指南或手册。 I was able to find recommendations for Spark master HA with ZooKeeper but that's different. 我可以通过ZooKeeper找到Spark master HA的建议，但这是不同的。

The problem is that if you run several instances of your application connecting to spark, you have to divide your available cluster resources between all of them which is an overkill. 问题是，如果您运行连接到spark的应用程序的多个实例，则必须在所有实例之间分配可用的群集资源，这实在太过分了。

Is there anything like the guide I'm looking for? 有什么需要的指南吗？

1 个解决方案

It depends on what your master is set to. 这取决于您的母版设置。 If your using yarn-client High availability is free or some what free. 如果您使用yarn-client，则高可用性是免费的或免费的。 If your running things in yarn-client or local mode if that machine goes down your pretty much done. 如果您在yarn-client或本地模式下运行该设备，那么您的工作就完成了。 Now really it boils down to what your trying to do. 现在，实际上归结为您要尝试做的事情。 If you want compute resources separate from the hadoop data nodes I'd look into a mesos cluster. 如果您想将计算资源与hadoop数据节点分开，我将研究mesos群集。 It's a great way to do sort of adhoc/Long running jobs without locking up yarn resources. 这是进行特定的/长期运行的工作的好方法，而又不会锁定纱线资源。