简体   繁体   English

Google Kubernetes集群上的待处理Spark Pod:CPU不足

[英]pending spark pod on google kubernetes cluster: insufficient cpu

I'm trying to submit a spark job through spark-submit to google kubernetes cluster. 我正在尝试通过Spark-Submit向Google kubernetes集群提交Spark作业。

The docker image is built from the spark official dockerfile from the 2.3.0 release. Docker镜像是从2.3.0版本中的spark官方dockerfile构建的。

The following is the submit script. 以下是提交脚本。

#! /bin/bash
spark-submit \
--master k8s://https://<master url> \
--deploy-mode cluster \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.container.image=<official image> \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.app.name=app-name \
--class ExpletivePI \
--name spark-pi \
local:///opt/spark/examples/spark-demo.jar

I can run this on my local minikube perfectly. 我可以在本地minikube上完美运行它。

However when I try submit this to my google kubernetes cluster. 但是,当我尝试将此提交到我的Google kubernetes集群时。 I always pod unscheduled due to insufficient cpu. 由于cpu不足,我总是不按计划进行播客。

0/3 nodes are available: 3 Insufficient cpu. 

kubectl describe node seems okay, and here is the problematic pod describe result kubectl描述节点似乎还可以,这是有问题的pod描述结果

Name:         spark-pi-e890cd00394b3b20942f22d0a9173c1c-driver
Namespace:    default
Node:         <none>
Labels:       spark-app-selector=spark-3e8ff877bebd46be9fc8d956531ba186
              spark-role=driver
Annotations:  spark-app-name=spark-pi
Status:       Pending
IP:           
Containers:
  spark-kubernetes-driver:
    Image:      geekbeta/spark:v2
    Port:       <none>
    Host Port:  <none>
    Args:
      driver
    Limits:
      memory:  1408Mi
    Requests:
      cpu:     1
      memory:  1Gi
    Environment:
      SPARK_DRIVER_MEMORY:        1g
      SPARK_DRIVER_CLASS:         ExpletivePI
      SPARK_DRIVER_ARGS:          
      SPARK_DRIVER_BIND_ADDRESS:   (v1:status.podIP)
      SPARK_MOUNTED_CLASSPATH:    /opt/spark/tang_stuff/spark-demo.jar:/opt/spark/tang_stuff/spark-demo.jar
      SPARK_JAVA_OPT_0:           -Dspark.app.name=spark-pi
      SPARK_JAVA_OPT_1:           -Dspark.app.id=spark-3e8ff877bebd46be9fc8d956531ba186
      SPARK_JAVA_OPT_2:           -Dspark.driver.host=spark-pi-e890cd00394b3b20942f22d0a9173c1c-driver-svc.default.svc
      SPARK_JAVA_OPT_3:           -Dspark.submit.deployMode=cluster
      SPARK_JAVA_OPT_4:           -Dspark.driver.blockManager.port=7079
      SPARK_JAVA_OPT_5:           -Dspark.kubernetes.executor.podNamePrefix=spark-pi-e890cd00394b3b20942f22d0a9173c1c
      SPARK_JAVA_OPT_6:           -Dspark.master=k8s://https://35.229.152.59
      SPARK_JAVA_OPT_7:           -Dspark.kubernetes.authenticate.driver.serviceAccountName=spark
      SPARK_JAVA_OPT_8:           -Dspark.executor.instances=1
      SPARK_JAVA_OPT_9:           -Dspark.kubernetes.container.image=geekbeta/spark:v2
      SPARK_JAVA_OPT_10:          -Dspark.kubernetes.driver.pod.name=spark-pi-e890cd00394b3b20942f22d0a9173c1c-driver
      SPARK_JAVA_OPT_11:          -Dspark.jars=/opt/spark/tang_stuff/spark-demo.jar,/opt/spark/tang_stuff/spark-demo.jar
      SPARK_JAVA_OPT_12:          -Dspark.driver.port=7078
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from spark-token-9gdsb (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  spark-token-9gdsb:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  spark-token-9gdsb
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  3m (x125 over 38m)  default-scheduler  0/3 nodes are available: 3 Insufficient cpu.

My cluster has 3 cpus and 11G RAM, I'm really confused and don't know what to do at this point, any advice or comments would be greatly appreciated, thank you in advance! 我的集群具有3 cpus和11G RAM,我真的很困惑,不知道该怎么办,非常感谢您提出任何建议或意见,谢谢!

problem solved, seems that the driver pod by default requires 1 cpu, which in my case, is impossible for GCP to accommodate, since each node on my GCP cluster has only one cpu. 问题已解决,似乎默认情况下,驱动程序Pod需要1个cpu,在我的情况下,这对于GCP是不可能的,因为我的GCP群集上的每个节点只有一个cpu。

By changing the driver pod request cpu to lower value, it can run on GCP 通过将驱动程序容器请求cpu更改为较低的值,它可以在GCP上运行

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM