Spark独立模式不将作业分发到其他工作程序节点

Question

I am running a spark job in standalone mode. 我在独立模式下运行Spark工作。 I have configured my worker node to connect to master node. 我已经将我的工作程序节点配置为连接到主节点。 They are getting connected successfully, but when I am running the job on spark master the job is not getting distributed. 他们已成功建立连接，但是当我在spark master上运行作业时，该作业没有分发。 I keep on getting the following message- 我不断收到以下消息-

WARN TaskSchedulerImpl: Initial job has not accepted any resources; WARN TaskSchedulerImpl：初始作业未接受任何资源； check your cluster UI to ensure that workers are registered and have sufficient resources 检查您的集群用户界面，以确保工作人员已注册并拥有足够的资源

I have tried to run the job as local on the worker node and its running fine which means resources are available. 我试图在工作节点上以本地方式运行作业，并且运行良好，这意味着资源可用。 Also the spark master-ui is showing that the worker has accepted the job.Password less ssh is enabled in both master and worker node to and fro. spark master-ui也显示工作程序已接受该工作。来回的master和worker节点均启用了passwordless ssh。 I feel it might be some firewall issue or may be spark driver port is not opened. 我感觉可能是某些防火墙问题，或者可能是未打开Spark驱动程序端口。 My worker node logs show- 我的工作节点日志显示-

16/03/21 10:05:40 INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-7-oracle/bin/java" "-cp" "/mnt/pd1/spark/spark-1.5.0-bin-hadoop2.6/sbin/../conf/:/mnt/pd1/spark/spark-1.5.0-bin-hadoop2.6/lib/spark-assembly-1.5.0-hadoop2.6.0.jar:/mnt/pd1/spark/spark-1.5.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/mnt/pd1/spark/spark-1.5.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/mnt/pd1/spark/spark-1.5.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar" "-Xms8192M" "-Xmx8192M" "-Dspark.driver.port=51810" "-Dspark.cassandra.connection.port=9042" "-XX:MaxPermSize=256m" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://sparkDriver@10.0.1.192:51810/user/CoarseGrainedScheduler" "--executor-id" "2" "--hostname" "10.0.1.194" "--cores" "4" "--app-id" "app-20160321100135-0001" "--worker-url" "akka.tcp://sparkWorker@10.0.1.194:39423/user/Worker" 16/03/21 10:05:40 INFO ExecutorRunner：启动命令：“ / usr / lib / jvm / java-7-oracle / bin / java”“-cp”“ / mnt / pd1 / spark / spark-1.5。 0-bin-hadoop2.6 / sbin /../ conf /：/ mnt / pd1 / spark / spark-1.5.0-bin-hadoop2.6 / lib / spark-assembly-1.5.0-hadoop2.6.0.jar ：/mnt/pd1/spark/spark-1.5.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar：/mnt/pd1/spark/spark-1.5.0-bin-hadoop2.6 /lib/datanucleus-api-jdo-3.2.6.jar：/mnt/pd1/spark/spark-1.5.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar“” -Xms8192M“ “ -Xmx8192M”“ -Dspark.driver.port = 51810”“ -Dspark.cassandra.connection.port = 9042”“ -XX：MaxPermSize = 256m”“ org.apache.spark.executor.CoarseGrainedExecutorBackend”“-驱动程序url“” akka.tcp：//sparkDriver@10.0.1.192：51810 / user / CoarseGrainedScheduler“” --executor-id“” 2“” --hostname“” 10.0.1.194“” --cores“” 4“” --app-id“” app-20160321100135-0001“” --worker-url“” akka.tcp：//sparkWorker@10.0.1.194：39423 / user / Worker“

Executor at worker nodes shows the following log in stderr - 工作节点上的执行程序在stderr中显示以下日志-

16/03/21 10:13:52 INFO Slf4jLogger: Slf4jLogger started 16/03/21 10:13:52 INFO Remoting: Starting remoting 16/03/21 10:13:52 INFO Remoting: Remoting started; 16/03/21 10:13:52 INFO Slf4jLogger：Slf4jLogger已启动16/03/21 10:13:52 INFO远程处理：开始远程处理16/03/21 10:13:52 INFO远程处理：远程处理已开始； listening on addresses :[akka.tcp://driverPropsFetcher@10.0.1.194:59715] 16/03/21 10:13:52 INFO Utils: Successfully started service 'driverPropsFetcher' on port 59715. 正在侦听地址：[akka.tcp：//driverPropsFetcher@10.0.1.194：59715] 16/03/21 10:13:52信息实用程序：已成功在端口59715上启动服务“ driverPropsFetcher”。

Answer 1

You can specifiy a specific driver port within the Spark context: 您可以在Spark上下文中指定特定的驱动程序端口：

spark.driver.port  = "port"
val conf = new SparkConf().set("spark.driver.port", "51810")

PS: When manually starting the spark worker on the worker machine and connect it to the Master, you don t need any further passless authentication or similar between master and spark. This would only be necessarry if you use the Master for starting all slaves (start-slaves.sh). So this shouldn PS：在工作机上手动启动spark worker并将其连接到Master时， t need any further passless authentication or similar between master and spark. This would only be necessarry if you use the Master for starting all slaves (start-slaves.sh). So this shouldn t need any further passless authentication or similar between master and spark. This would only be necessarry if you use the Master for starting all slaves (start-slaves.sh). So this shouldn t need any further passless authentication or similar between master and spark. This would only be necessarry if you use the Master for starting all slaves (start-slaves.sh). So this shouldn t be a problem. t need any further passless authentication or similar between master and spark. This would only be necessarry if you use the Master for starting all slaves (start-slaves.sh). So this shouldn是一个问题。

Answer 2

Many people have the issue when setting up new clusters. 设置新集群时，很多人都会遇到问题。 If you can find spark slaves in the web UI but they are not accepting jobs, there is a high chance that the firewall is blocking the communication. 如果您可以在Web UI中找到Spark Slave，但它们不接受作业，则防火墙很有可能阻止了通信。 Take a look at my other answer: Apache Spark on Mesos: Initial job has not accepted any resources : 看看我的其他答案： Mesos上的Apache Spark：初始作业未接受任何资源：

While most of other answers focuses on resource allocation (cores, memory) on spark slaves, I would like to highlight that firewall could cause exactly the same issue, especially when you are running spark on cloud platforms. 尽管其他大多数答案都集中在spark从站上的资源分配（内核，内存），但我想强调一下防火墙可能引起完全相同的问题，尤其是在云平台上运行spark时。

If you can find spark slaves in the web UI, you have probably opened the standard ports 8080, 8081, 7077, 4040. Nonetheless, when you actually run a job, it uses SPARK_WORKER_PORT, spark.driver.port and spark.blockManager.port which by default are randomly assigned. 如果可以在Web UI中找到spark从属，则可能已经打开了标准端口8080、8081、7077、4040。但是，当您实际运行作业时，它会使用SPARK_WORKER_PORT，spark.driver.port和spark.blockManager.port。默认情况下是随机分配的。 If your firewall is blocking these ports, the master could not retrieve any job-specific response from slaves and return the error. 如果防火墙阻止了这些端口，则主服务器无法从从服务器检索任何特定于作业的响应并返回错误。

You can run a quick test by opening all the ports and see whether the slave accepts jobs. 您可以通过打开所有端口并查看从站是否接受作业来运行快速测试。

Spark独立模式不将作业分发到其他工作程序节点

问题描述

2 个解决方案

解决方案1
0 2016-03-21 09:54:07

解决方案2
0 2017-12-08 02:50:34

Spark独立模式不将作业分发到其他工作程序节点

问题描述

2 个解决方案

解决方案1 0 2016-03-21 09:54:07

解决方案2 0 2017-12-08 02:50:34

解决方案1
0 2016-03-21 09:54:07

解决方案2
0 2017-12-08 02:50:34