简体   繁体   中英

Apache Spark Standalone Cluster Initial Job not accepting resources

First, I know this question has been asked before, but they answers don't seem to apply to my situation.

I'm using Digital Ocean, I have three servers (Ubuntu 14.04 with 2 cores, 2gb RAM, 40Gb disk), one master and two slaves. They were all installed with Spark 1.6.1 compiled from source. They all failed to build due to lack of memory so I configured each with 16Gb swap and then the build happened fine.

I started the standalone server on the master by explicitly setting the host to the public IP address using:

./sbin/start-master.sh -h 104.236.221.106

That's the actual IP address - you can visit http://104.236.221.106:8080/ to see the state of the cluster. Anyway, the slaves were started with:

./sbin/start-slave.sh spark://104.236.221.106:7077 -m 10g

Because Spark detected only 2Gb of system memory it was only grabbing 1Gb, so I explicitly told it to use more so the system would leverage swap if needed.

Looking at the webportal, I see it report there are 2 workers with 4 total cores in the cluster, etc...

门户快照

Everything seems like it should be working great so I launch the shell for interactive work from the master server using:

./bin/spark-shell --master spark://104.236.221.106:7077 --executor-memory 4g

The goal being to have plenty of memory. It launches and gives me the command line so I set a value val NUM_SAMPLES=10000 - not a big number, but something and then I try to use the example code from Apache to estimate Pi.

val count = sc.parallelize(1 to NUM_SAMPLES).map{i =>
  val x = Math.random()
  val y = Math.random()
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)

and what I get out of it is:

[Stage 0:>                                                          (0 + 0) / 2]16/03/19 07:23:57 WARN TaskSchedulerImpl: 
    Initial job has not accepted any resources; 
    check your cluster UI to ensure that workers are registered 
    and have sufficient resources
16/03/19 07:24:12 WARN TaskSchedulerImpl: 
    Initial job has not accepted any resources; 
    check your cluster UI to ensure that workers are registered 
    and have sufficient resources
16/03/19 07:24:27 WARN TaskSchedulerImpl: 
    Initial job has not accepted any resources; 
    check your cluster UI to ensure that workers are registered 
    and have sufficient resources
[Stage 0:>                                                          (0 + 0) / 2]16/03/19 07:24:42 WARN TaskSchedulerImpl: 
    Initial job has not accepted any resources; 
    check your cluster UI to ensure that workers are registered 
    and have sufficient resources

So this doesn't make sense for a few reasons. First, I know the cluster exists because I can see it in the web portal. I see the job that's been created with the allocation of memory I requested. I see in top on the slaves that java is doing stuff which makes me think they got work. Finally, the task I'm asking for is stupid simple, it shouldn't be consuming almost anything.

What did I do wrong in configuration/the way I'm trying to run this code?

I'm including examples of the master and worker logs:

WORKER: http://pastebin.com/xwnBMaKQ

MASTER: http://pastebin.com/0Ja0KD9k

It looks like the workers are still trying to hit a private IP address despite launching the master with an explicit IP address and (after some help) launching the workers with their public IP address too.

Have you tried using explicitly private IP addresses? I would first make sure you can put your cluster in a consistent state before worrying about the access from the public IP address.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM