[英]Spark job isn't being scheduled to workers when submit it from docker container
There is a spark master installed on a host.主机上安装了 spark master。 The spark is running in standalone mode with workers on separate nodes.
火花以独立模式运行,工作人员在不同的节点上。 All the spark infrastructure is running without docker.
所有 spark 基础设施都在没有 docker 的情况下运行。 And there is a docker container for airflow running on the spark master host.
在 spark master 主机上运行 airflow 的 docker 容器。 The container starts like this
容器是这样开始的
docker run -d --network host -v /usr/share/javazi-1.8/:/usr/share/javazi-1.8 -v
/home/airflow/dags/:/usr/local/airflow/dags -v /home/spark-2.3.3/:/home/spark-2.3.3 -v
/usr/local/hadoop/:/usr/local/hadoop -v /usr/lib/jvm/java/:/usr/lib/jvm/java -v`
/usr/local/opt/:/usr/local/opt airflow
So spark-submit is specified as a volume.所以spark-submit被指定为一个卷。 And the container uses host network.
并且容器使用主机网络。
I am trying to submit my spark job from the docker container, like this:我正在尝试从 docker 容器提交我的 spark 作业,如下所示:
/home/spark-2.3.3/bin/spark-submit --master=spark://spark-master.net:7077
--class=com.mysparkjob.Main --driver-memory=4G --executor-cores=6
--total-executor-cores=12 --executor-memory=10G /home/spark/my-job.jar
but execution freezes on these logs但执行冻结在这些日志上
2020-07-06 20:34:21 INFO SparkContext:54 - Running Spark version 2.3.3
2020-07-06 20:34:21 WARN SparkConf:66 - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
2020-07-06 20:34:21 INFO SparkContext:54 - Submitted application: My app
2020-07-06 20:34:21 INFO SecurityManager:54 - Changing view acls to: root
2020-07-06 20:34:21 INFO SecurityManager:54 - Changing modify acls to: root
2020-07-06 20:34:21 INFO SecurityManager:54 - Changing view acls groups to:
2020-07-06 20:34:21 INFO SecurityManager:54 - Changing modify acls groups to:
2020-07-06 20:34:21 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2020-07-06 20:34:21 INFO Utils:54 - Successfully started service 'sparkDriver' on port 46677.
2020-07-06 20:34:21 INFO SparkEnv:54 - Registering MapOutputTracker
2020-07-06 20:34:21 INFO SparkEnv:54 - Registering BlockManagerMaster
2020-07-06 20:34:21 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2020-07-06 20:34:21 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2020-07-06 20:34:21 INFO DiskBlockManager:54 - Created local directory at /home/sparkdata/blockmgr-3b52d93a-149e-49a2-9664-ce19fc12e76e
2020-07-06 20:34:21 INFO MemoryStore:54 - MemoryStore started with capacity 2004.6 MB
2020-07-06 20:34:21 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2020-07-06 20:34:21 INFO log:192 - Logging initialized @83360ms
2020-07-06 20:34:21 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
2020-07-06 20:34:21 INFO Server:419 - Started @83405ms
2020-07-06 20:34:21 INFO AbstractConnector:278 - Started ServerConnector@240a2619{HTTP/1.1,[http/1.1]}{my_ip:4040}
2020-07-06 20:34:21 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040.
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3bd08435{/jobs,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@65859b44{/jobs/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@d9f5fce{/jobs/job,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@45b7c97f{/jobs/job/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@c212536{/stages,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7b377a53{/stages/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1b0e031b{/stages/stage,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@25214797{/stages/stage/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4e5c8ef3{/stages/pool,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@60928a61{/stages/pool/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@27358a19{/storage,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@8077c97{/storage/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@22865072{/storage/rdd,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@563317c1{/storage/rdd/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5d5d3a5c{/environment,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6e0d16a4{/environment/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7e18ced7{/executors,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@305b43ca{/executors/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4601047{/executors/threadDump,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@25e8e59{/executors/threadDump/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3a0896b3{/static,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@635ff2a5{/,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@55adcf9e{/api,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@58601e7a{/jobs/job/kill,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@62735b13{/stages/stage/kill,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO SparkUI:54 - Bound SparkUI to my_ip, and started at http://my_ip:4040
2020-07-06 20:34:21 INFO SparkContext:54 - Added JAR file:/home/spark/my-job.jar at spark://my_ip:46677/jars/my-job.jar with timestamp 1594067661464
2020-07-06 20:34:21 WARN FairSchedulableBuilder:66 - Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in fairscheduler.xml or set spark.scheduler.allocation.file to a file that contains the configuration.
2020-07-06 20:34:21 INFO FairSchedulableBuilder:54 - Created default pool: default, schedulingMode: FIFO, minShare: 0, weight: 1
2020-07-06 20:34:21 INFO StandaloneAppClient$ClientEndpoint:54 - Connecting to master spark://spark-master.net:7077...
2020-07-06 20:34:21 INFO TransportClientFactory:267 - Successfully created connection to spark-master.net/my_ip:7077 after 14 ms (0 ms spent in bootstraps)
2020-07-06 20:34:21 INFO StandaloneSchedulerBackend:54 - Connected to Spark cluster with app ID app-20200706223421-1147
2020-07-06 20:34:21 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33659.
2020-07-06 20:34:21 INFO NettyBlockTransferService:54 - Server created on my_ip:33659
2020-07-06 20:34:21 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2020-07-06 20:34:21 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, my_ip, 33659, None)
2020-07-06 20:34:21 INFO BlockManagerMasterEndpoint:54 - Registering block manager my_ip:33659 with 2004.6 MB RAM, BlockManagerId(driver, my_ip, 33659, None)
2020-07-06 20:34:21 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, my_ip, 33659, None)
2020-07-06 20:34:21 INFO BlockManager:54 - external shuffle service port = 8888
2020-07-06 20:34:21 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, my_ip, 33659, None)
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2bc16fe2{/metrics/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO EventLoggingListener:54 - Logging events to hdfs://my_hdfs_ip:54310/sparkEventLogs/app-20200706223421-1147
2020-07-06 20:34:21 INFO Utils:54 - Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
2020-07-06 20:34:21 INFO StandaloneSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
And the spark job isn't being executed further.并且火花作业没有进一步执行。 It looks like some network issues.
它看起来像一些网络问题。 Maybe workers can't reach spark master if a job was submitted from a container?
如果从容器提交作业,工人可能无法到达火花大师? I will be glad to get any advice or help from you guys.
我很乐意从你们那里得到任何建议或帮助。 Thanks
谢谢
Most probably executors couldn't reach the driver that is running in container.执行程序很可能无法访问在容器中运行的驱动程序。 You need to look to the option
spark.driver.host
and set it to the IP of the container that is visible from outside, otherwise Spark in container will advertise the internal Docker network address.您需要查看选项
spark.driver.host
并将其设置为从外部可见的容器的 IP,否则容器中的 Spark 将通告内部 Docker 网络地址。 You also need to set spark.driver.bindAddress
to the address local to the container, so Spark would able to perform bind.您还需要将
spark.driver.bindAddress
设置为容器本地地址,以便 Spark 能够执行绑定。
From the documentation :从文档中:
spark.driver.bindAddress
- It also allows a different address from the local one to be advertised to executors or external systems.spark.driver.bindAddress
- 它还允许将与本地地址不同的地址通告给执行程序或外部系统。 This is useful, for example, when running containers with bridged networking.这很有用,例如,在使用桥接网络运行容器时。 For this to properly work, the different ports used by the driver (RPC, block manager and UI) need to be forwarded from the container's host.
为了使其正常工作,驱动程序使用的不同端口(RPC、块管理器和 UI)需要从容器的主机转发。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.