繁体   English   中英

从 docker 容器提交时,未将 Spark 作业安排给工作人员

[英]Spark job isn't being scheduled to workers when submit it from docker container

主机上安装了 spark master。 火花以独立模式运行,工作人员在不同的节点上。 所有 spark 基础设施都在没有 docker 的情况下运行。 在 spark master 主机上运行 airflow 的 docker 容器。 容器是这样开始的

 docker run -d --network host -v /usr/share/javazi-1.8/:/usr/share/javazi-1.8 -v  
 /home/airflow/dags/:/usr/local/airflow/dags -v /home/spark-2.3.3/:/home/spark-2.3.3 -v  
 /usr/local/hadoop/:/usr/local/hadoop -v /usr/lib/jvm/java/:/usr/lib/jvm/java -v` 
 /usr/local/opt/:/usr/local/opt airflow

所以spark-submit被指定为一个卷。 并且容器使用主机网络。
我正在尝试从 docker 容器提交我的 spark 作业,如下所示:

/home/spark-2.3.3/bin/spark-submit --master=spark://spark-master.net:7077 
--class=com.mysparkjob.Main --driver-memory=4G --executor-cores=6 
--total-executor-cores=12 --executor-memory=10G /home/spark/my-job.jar

但执行冻结在这些日志上

2020-07-06 20:34:21 INFO  SparkContext:54 - Running Spark version 2.3.3
2020-07-06 20:34:21 WARN  SparkConf:66 - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
2020-07-06 20:34:21 INFO  SparkContext:54 - Submitted application: My app
2020-07-06 20:34:21 INFO  SecurityManager:54 - Changing view acls to: root
2020-07-06 20:34:21 INFO  SecurityManager:54 - Changing modify acls to: root
2020-07-06 20:34:21 INFO  SecurityManager:54 - Changing view acls groups to: 
2020-07-06 20:34:21 INFO  SecurityManager:54 - Changing modify acls groups to: 
2020-07-06 20:34:21 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
2020-07-06 20:34:21 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 46677.
2020-07-06 20:34:21 INFO  SparkEnv:54 - Registering MapOutputTracker
2020-07-06 20:34:21 INFO  SparkEnv:54 - Registering BlockManagerMaster
2020-07-06 20:34:21 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2020-07-06 20:34:21 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2020-07-06 20:34:21 INFO  DiskBlockManager:54 - Created local directory at /home/sparkdata/blockmgr-3b52d93a-149e-49a2-9664-ce19fc12e76e
2020-07-06 20:34:21 INFO  MemoryStore:54 - MemoryStore started with capacity 2004.6 MB
2020-07-06 20:34:21 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2020-07-06 20:34:21 INFO  log:192 - Logging initialized @83360ms
2020-07-06 20:34:21 INFO  Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
2020-07-06 20:34:21 INFO  Server:419 - Started @83405ms
2020-07-06 20:34:21 INFO  AbstractConnector:278 - Started ServerConnector@240a2619{HTTP/1.1,[http/1.1]}{my_ip:4040}
2020-07-06 20:34:21 INFO  Utils:54 - Successfully started service 'SparkUI' on port 4040.
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3bd08435{/jobs,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@65859b44{/jobs/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@d9f5fce{/jobs/job,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@45b7c97f{/jobs/job/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@c212536{/stages,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7b377a53{/stages/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1b0e031b{/stages/stage,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@25214797{/stages/stage/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4e5c8ef3{/stages/pool,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@60928a61{/stages/pool/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@27358a19{/storage,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@8077c97{/storage/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@22865072{/storage/rdd,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@563317c1{/storage/rdd/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5d5d3a5c{/environment,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6e0d16a4{/environment/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7e18ced7{/executors,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@305b43ca{/executors/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4601047{/executors/threadDump,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@25e8e59{/executors/threadDump/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3a0896b3{/static,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@635ff2a5{/,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@55adcf9e{/api,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@58601e7a{/jobs/job/kill,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@62735b13{/stages/stage/kill,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  SparkUI:54 - Bound SparkUI to my_ip, and started at http://my_ip:4040
2020-07-06 20:34:21 INFO  SparkContext:54 - Added JAR file:/home/spark/my-job.jar at spark://my_ip:46677/jars/my-job.jar with timestamp 1594067661464
2020-07-06 20:34:21 WARN  FairSchedulableBuilder:66 - Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in fairscheduler.xml or set spark.scheduler.allocation.file to a file that contains the configuration.
2020-07-06 20:34:21 INFO  FairSchedulableBuilder:54 - Created default pool: default, schedulingMode: FIFO, minShare: 0, weight: 1
2020-07-06 20:34:21 INFO  StandaloneAppClient$ClientEndpoint:54 - Connecting to master spark://spark-master.net:7077...
2020-07-06 20:34:21 INFO  TransportClientFactory:267 - Successfully created connection to spark-master.net/my_ip:7077 after 14 ms (0 ms spent in bootstraps)
2020-07-06 20:34:21 INFO  StandaloneSchedulerBackend:54 - Connected to Spark cluster with app ID app-20200706223421-1147
2020-07-06 20:34:21 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33659.
2020-07-06 20:34:21 INFO  NettyBlockTransferService:54 - Server created on my_ip:33659
2020-07-06 20:34:21 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2020-07-06 20:34:21 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, my_ip, 33659, None)
2020-07-06 20:34:21 INFO  BlockManagerMasterEndpoint:54 - Registering block manager my_ip:33659 with 2004.6 MB RAM, BlockManagerId(driver, my_ip, 33659, None)
2020-07-06 20:34:21 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, my_ip, 33659, None)
2020-07-06 20:34:21 INFO  BlockManager:54 - external shuffle service port = 8888
2020-07-06 20:34:21 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, my_ip, 33659, None)
2020-07-06 20:34:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2bc16fe2{/metrics/json,null,AVAILABLE,@Spark}
2020-07-06 20:34:21 INFO  EventLoggingListener:54 - Logging events to hdfs://my_hdfs_ip:54310/sparkEventLogs/app-20200706223421-1147
2020-07-06 20:34:21 INFO  Utils:54 - Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
2020-07-06 20:34:21 INFO  StandaloneSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0

并且火花作业没有进一步执行。 它看起来像一些网络问题。 如果从容器提交作业,工人可能无法到达火花大师? 我很乐意从你们那里得到任何建议或帮助。 谢谢

执行程序很可能无法访问在容器中运行的驱动程序。 您需要查看选项spark.driver.host并将其设置为从外部可见的容器的 IP,否则容器中的 Spark 将通告内部 Docker 网络地址。 您还需要将spark.driver.bindAddress设置为容器本地地址,以便 Spark 能够执行绑定。

文档中:

spark.driver.bindAddress - 它还允许将与本地地址不同的地址通告给执行程序或外部系统。 这很有用,例如,在使用桥接网络运行容器时。 为了使其正常工作,驱动程序使用的不同端口(RPC、块管理器和 UI)需要从容器的主机转发。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM