简体   繁体   中英

Make spark environment for cluster

I made a spark application that analyze file data. Since input file data size could be big, It's not enough to run my application as standalone. With one more physical machine, how should I make architecture for it?

I'm considering using mesos for cluster manager but pretty noobie at hdfs. Is there any way to make it without hdfs (for sharing file data)?

Spark maintain couple cluster modes . Yarn, Mesos and Standalone. You may start with the Standalone mode which means you work on your cluster file-system.

If you are running on Amazon EC2, you may refer to the following article in order to use Spark built-in scripts that loads Spark cluster automatically.

If you are running on an on-prem environment, the way to run in Standalone mode is as follows:

-Start a standalone master

./sbin/start-master.sh

-The master will print out a spark://HOST:PORT URL for itself. For each worker (machine) on your cluster use the URL in the following command:

./sbin/start-slave.sh <master-spark-URL>

-In order to validate that the worker was added to the cluster, you may refer to the following URL: http://localhost:8080 on your master machine and get Spark UI that shows more info about the cluster and its workers.

There are many more parameters to play with. For more info, please refer to this documentation

Hope I have managed to help! :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM