[英]Spark job isn't being scheduled to workers when submit it from docker container
主机上安装了 spark master。 火花以独立模式运行,工作人员在不同的节点上。 所有 spark 基础设施都在没有 docker 的情况下运行。 在 spark master 主机上运行 airflow 的 docker 容器。 容器是这样开始的
docker run -d --network host -v /usr/share/javazi-1.8/:/usr/share/javazi-1.8 -v
/home/airflow/dags/:/usr/local/airflow/dags -v /home/spark-2.3.3/:/home/spark-2.3.3 -v
/usr/local/hadoop/:/usr/local/hadoop -v /usr/lib/jvm/java/:/usr/lib/jvm/java -v`
/usr/local/opt/:/usr/local/opt airflow
所以spark-submit被指定为一个卷。 并且容器使用主机网络。
我正在尝试从 docker 容器提交我的 spark 作业,如下所示:
/home/spark-2.3.3/bin/spark-submit --master=spark://spark-master.net:7077
--class=com.mysparkjob.Main --driver-memory=4G --executor-cores=6
--total-executor-cores=12 --executor-memory=10G /home/spark/my-job.jar
但执行冻结在这些日志上
2020-07-06 20:34:21 INFO SparkContext:54 - Running Spark version 2.3.3
2020-07-06 20:34:21 WARN SparkConf:66 - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
2020-07-06 20:34:21 INFO SparkContext:54 - Submitted application: My app
2020-07-06 20:34:21 INFO SecurityManager:54 - Changing view acls to: root
2020-07-06 20:34:21 INFO SecurityManager:54 - Changing modify acls to: root
2020-07-06 20:34:21 INFO SecurityManager:54 - Changing view acls groups to:
2020-07-06 20:34:21 INFO SecurityManager:54 - Changing modify acls groups to:
2020-07-06 20:34:21 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2020-07-06 20:34:21 INFO Utils:54 - Successfully started service 'sparkDriver' on port 46677.
2020-07-06 20:34:21 INFO SparkEnv:54 - Registering MapOutputTracker
2020-07-06 20:34:21 INFO SparkEnv:54 - Registering BlockManagerMaster
2020-07-06 20:34:21 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2020-07-06 20:34:21 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2020-07-06 20:34:21 INFO DiskBlockManager:54 - Created local directory at /home/sparkdata/blockmgr-3b52d93a-149e-49a2-9664-ce19fc12e76e
2020-07-06 20:34:21 INFO MemoryStore:54 - MemoryStore started with capacity 2004.6 MB
2020-07-06 20:34:21 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2020-07-06 20:34:21 INFO log:192 - Logging initialized @83360ms
2020-07-06 20:34:21 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
2020-07-06 20:34:21 INFO Server:419 - Started @83405ms
2020-07-06 20:34:21 INFO AbstractConnector:278 - Started ServerConnector@240a2619{HTTP/1.1,[http/1.1]}{my_ip:4040}
2020-07-06 20:34:21 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040.
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3bd08435{/jobs,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@65859b44{/jobs/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@d9f5fce{/jobs/job,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@45b7c97f{/jobs/job/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@c212536{/stages,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7b377a53{/stages/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1b0e031b{/stages/stage,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@25214797{/stages/stage/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4e5c8ef3{/stages/pool,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@60928a61{/stages/pool/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@27358a19{/storage,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@8077c97{/storage/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@22865072{/storage/rdd,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@563317c1{/storage/rdd/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5d5d3a5c{/environment,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6e0d16a4{/environment/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7e18ced7{/executors,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@305b43ca{/executors/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4601047{/executors/threadDump,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@25e8e59{/executors/threadDump/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3a0896b3{/static,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@635ff2a5{/,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@55adcf9e{/api,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@58601e7a{/jobs/job/kill,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@62735b13{/stages/stage/kill,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO SparkUI:54 - Bound SparkUI to my_ip, and started at http://my_ip:4040
2020-07-06 20:34:21 INFO SparkContext:54 - Added JAR file:/home/spark/my-job.jar at spark://my_ip:46677/jars/my-job.jar with timestamp 1594067661464
2020-07-06 20:34:21 WARN FairSchedulableBuilder:66 - Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in fairscheduler.xml or set spark.scheduler.allocation.file to a file that contains the configuration.
2020-07-06 20:34:21 INFO FairSchedulableBuilder:54 - Created default pool: default, schedulingMode: FIFO, minShare: 0, weight: 1
2020-07-06 20:34:21 INFO StandaloneAppClient$ClientEndpoint:54 - Connecting to master spark://spark-master.net:7077...
2020-07-06 20:34:21 INFO TransportClientFactory:267 - Successfully created connection to spark-master.net/my_ip:7077 after 14 ms (0 ms spent in bootstraps)
2020-07-06 20:34:21 INFO StandaloneSchedulerBackend:54 - Connected to Spark cluster with app ID app-20200706223421-1147
2020-07-06 20:34:21 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33659.
2020-07-06 20:34:21 INFO NettyBlockTransferService:54 - Server created on my_ip:33659
2020-07-06 20:34:21 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2020-07-06 20:34:21 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, my_ip, 33659, None)
2020-07-06 20:34:21 INFO BlockManagerMasterEndpoint:54 - Registering block manager my_ip:33659 with 2004.6 MB RAM, BlockManagerId(driver, my_ip, 33659, None)
2020-07-06 20:34:21 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, my_ip, 33659, None)
2020-07-06 20:34:21 INFO BlockManager:54 - external shuffle service port = 8888
2020-07-06 20:34:21 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, my_ip, 33659, None)
2020-07-06 20:34:21 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2bc16fe2{/metrics/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO EventLoggingListener:54 - Logging events to hdfs://my_hdfs_ip:54310/sparkEventLogs/app-20200706223421-1147
2020-07-06 20:34:21 INFO Utils:54 - Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
2020-07-06 20:34:21 INFO StandaloneSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
并且火花作业没有进一步执行。 它看起来像一些网络问题。 如果从容器提交作业,工人可能无法到达火花大师? 我很乐意从你们那里得到任何建议或帮助。 谢谢
执行程序很可能无法访问在容器中运行的驱动程序。 您需要查看选项spark.driver.host
并将其设置为从外部可见的容器的 IP,否则容器中的 Spark 将通告内部 Docker 网络地址。 您还需要将spark.driver.bindAddress
设置为容器本地地址,以便 Spark 能够执行绑定。
从文档中:
spark.driver.bindAddress
- 它还允许将与本地地址不同的地址通告给执行程序或外部系统。 这很有用,例如,在使用桥接网络运行容器时。 为了使其正常工作,驱动程序使用的不同端口(RPC、块管理器和 UI)需要从容器的主机转发。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.