简体   繁体   English

从 docker 容器提交时,未将 Spark 作业安排给工作人员

[英]Spark job isn't being scheduled to workers when submit it from docker container

There is a spark master installed on a host.主机上安装了 spark master。 The spark is running in standalone mode with workers on separate nodes.火花以独立模式运行,工作人员在不同的节点上。 All the spark infrastructure is running without docker.所有 spark 基础设施都在没有 docker 的情况下运行。 And there is a docker container for airflow running on the spark master host.在 spark master 主机上运行 airflow 的 docker 容器。 The container starts like this容器是这样开始的

 docker run -d --network host -v /usr/share/javazi-1.8/:/usr/share/javazi-1.8 -v  
 /home/airflow/dags/:/usr/local/airflow/dags -v /home/spark-2.3.3/:/home/spark-2.3.3 -v  
 /usr/local/hadoop/:/usr/local/hadoop -v /usr/lib/jvm/java/:/usr/lib/jvm/java -v` 
 /usr/local/opt/:/usr/local/opt airflow

So spark-submit is specified as a volume.所以spark-submit被指定为一个卷。 And the container uses host network.并且容器使用主机网络。
I am trying to submit my spark job from the docker container, like this:我正在尝试从 docker 容器提交我的 spark 作业,如下所示:

/home/spark-2.3.3/bin/spark-submit --master=spark://spark-master.net:7077 
--class=com.mysparkjob.Main --driver-memory=4G --executor-cores=6 
--total-executor-cores=12 --executor-memory=10G /home/spark/my-job.jar

but execution freezes on these logs但执行冻结在这些日志上

2020-07-06 20:34:21 INFO  SparkContext:54 - Running Spark version 2.3.3
2020-07-06 20:34:21 WARN  SparkConf:66 - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
2020-07-06 20:34:21 INFO  SparkContext:54 - Submitted application: My app
2020-07-06 20:34:21 INFO  SecurityManager:54 - Changing view acls to: root
2020-07-06 20:34:21 INFO  SecurityManager:54 - Changing modify acls to: root
2020-07-06 20:34:21 INFO  SecurityManager:54 - Changing view acls groups to: 
2020-07-06 20:34:21 INFO  SecurityManager:54 - Changing modify acls groups to: 
2020-07-06 20:34:21 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
2020-07-06 20:34:21 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 46677.
2020-07-06 20:34:21 INFO  SparkEnv:54 - Registering MapOutputTracker
2020-07-06 20:34:21 INFO  SparkEnv:54 - Registering BlockManagerMaster
2020-07-06 20:34:21 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2020-07-06 20:34:21 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2020-07-06 20:34:21 INFO  DiskBlockManager:54 - Created local directory at /home/sparkdata/blockmgr-3b52d93a-149e-49a2-9664-ce19fc12e76e
2020-07-06 20:34:21 INFO  MemoryStore:54 - MemoryStore started with capacity 2004.6 MB
2020-07-06 20:34:21 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2020-07-06 20:34:21 INFO  log:192 - Logging initialized @83360ms
2020-07-06 20:34:21 INFO  Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
2020-07-06 20:34:21 INFO  Server:419 - Started @83405ms
2020-07-06 20:34:21 INFO  AbstractConnector:278 - Started ServerConnector@240a2619{HTTP/1.1,[http/1.1]}{my_ip:4040}
2020-07-06 20:34:21 INFO  Utils:54 - Successfully started service 'SparkUI' on port 4040.
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3bd08435{/jobs,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@65859b44{/jobs/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@d9f5fce{/jobs/job,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@45b7c97f{/jobs/job/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@c212536{/stages,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7b377a53{/stages/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1b0e031b{/stages/stage,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@25214797{/stages/stage/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4e5c8ef3{/stages/pool,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@60928a61{/stages/pool/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@27358a19{/storage,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@8077c97{/storage/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@22865072{/storage/rdd,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@563317c1{/storage/rdd/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5d5d3a5c{/environment,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6e0d16a4{/environment/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7e18ced7{/executors,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@305b43ca{/executors/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4601047{/executors/threadDump,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@25e8e59{/executors/threadDump/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3a0896b3{/static,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@635ff2a5{/,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@55adcf9e{/api,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@58601e7a{/jobs/job/kill,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@62735b13{/stages/stage/kill,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  SparkUI:54 - Bound SparkUI to my_ip, and started at http://my_ip:4040
2020-07-06 20:34:21 INFO  SparkContext:54 - Added JAR file:/home/spark/my-job.jar at spark://my_ip:46677/jars/my-job.jar with timestamp 1594067661464
2020-07-06 20:34:21 WARN  FairSchedulableBuilder:66 - Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in fairscheduler.xml or set spark.scheduler.allocation.file to a file that contains the configuration.
2020-07-06 20:34:21 INFO  FairSchedulableBuilder:54 - Created default pool: default, schedulingMode: FIFO, minShare: 0, weight: 1
2020-07-06 20:34:21 INFO  StandaloneAppClient$ClientEndpoint:54 - Connecting to master spark://spark-master.net:7077...
2020-07-06 20:34:21 INFO  TransportClientFactory:267 - Successfully created connection to spark-master.net/my_ip:7077 after 14 ms (0 ms spent in bootstraps)
2020-07-06 20:34:21 INFO  StandaloneSchedulerBackend:54 - Connected to Spark cluster with app ID app-20200706223421-1147
2020-07-06 20:34:21 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33659.
2020-07-06 20:34:21 INFO  NettyBlockTransferService:54 - Server created on my_ip:33659
2020-07-06 20:34:21 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2020-07-06 20:34:21 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, my_ip, 33659, None)
2020-07-06 20:34:21 INFO  BlockManagerMasterEndpoint:54 - Registering block manager my_ip:33659 with 2004.6 MB RAM, BlockManagerId(driver, my_ip, 33659, None)
2020-07-06 20:34:21 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, my_ip, 33659, None)
2020-07-06 20:34:21 INFO  BlockManager:54 - external shuffle service port = 8888
2020-07-06 20:34:21 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, my_ip, 33659, None)
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2bc16fe2{/metrics/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  EventLoggingListener:54 - Logging events to hdfs://my_hdfs_ip:54310/sparkEventLogs/app-20200706223421-1147
2020-07-06 20:34:21 INFO  Utils:54 - Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
2020-07-06 20:34:21 INFO  StandaloneSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0

And the spark job isn't being executed further.并且火花作业没有进一步执行。 It looks like some network issues.它看起来像一些网络问题。 Maybe workers can't reach spark master if a job was submitted from a container?如果从容器提交作业,工人可能无法到达火花大师? I will be glad to get any advice or help from you guys.我很乐意从你们那里得到任何建议或帮助。 Thanks谢谢

Most probably executors couldn't reach the driver that is running in container.执行程序很可能无法访问在容器中运行的驱动程序。 You need to look to the option spark.driver.host and set it to the IP of the container that is visible from outside, otherwise Spark in container will advertise the internal Docker network address.您需要查看选项spark.driver.host并将其设置为从外部可见的容器的 IP,否则容器中的 Spark 将通告内部 Docker 网络地址。 You also need to set spark.driver.bindAddress to the address local to the container, so Spark would able to perform bind.您还需要将spark.driver.bindAddress设置为容器本地地址,以便 Spark 能够执行绑定。

From the documentation :文档中:

spark.driver.bindAddress - It also allows a different address from the local one to be advertised to executors or external systems. spark.driver.bindAddress - 它还允许将与本地地址不同的地址通告给执行程序或外部系统。 This is useful, for example, when running containers with bridged networking.这很有用,例如,在使用桥接网络运行容器时。 For this to properly work, the different ports used by the driver (RPC, block manager and UI) need to be forwarded from the container's host.为了使其正常工作,驱动程序使用的不同端口(RPC、块管理器和 UI)需要从容器的主机转发。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM