简体   繁体   English

Spark从kubernetes集群提交(2.3)

[英]Spark submit (2.3) on kubernetes cluster from Python

So now that k8s is integrated directly with spark in 2.3 my spark submit from the console executes correctly on a kuberenetes master without any spark master pods running, spark handles all the k8s details: 所以现在k8s直接与spark集成在一起2.3我从控制台提交的火花正确地在kuberenetes主机上执行而没有任何火花主机吊舱运行,火花处理所有k8s细节:

spark-submit \
  --deploy-mode cluster \
  --class com.app.myApp \
  --master k8s://https://myCluster.com \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  --conf spark.app.name=myApp \
  --conf spark.executor.instances=10 \
  --conf spark.kubernetes.container.image=myImage \
  local:///myJar.jar

What I am trying to do is do a spark-submit via AWS lambda to my k8s cluster. 我想要做的是通过AWS lambda向我的k8s集群做一个spark-submit。 Previously I used the command via the spark master REST API directly (without kubernetes): 以前我通过spark master REST API直接使用该命令(没有kubernetes):

request = requests.Request(
    'POST',
    "http://<master-ip>:6066/v1/submissions/create",
    data=json.dumps(parameters))
prepared = request.prepare()
session = requests.Session()
response = session.send(prepared)

And it worked. 它奏效了。 Now I want to integrate Kubernetes and do it similarly where I submit an API request to my kubernetes cluster from python and have spark handle all the k8s details, ideally something like: 现在我想集成Kubernetes,并且类似地在我从python向我的kubernetes集群提交API请求并且使用spark来处理所有k8s细节,理想情况如下:

request = requests.Request(
    'POST',
    "k8s://https://myK8scluster.com:443",
    data=json.dumps(parameters))

Is it possible in the Spark 2.3/Kubernetes integration? Spark 2.3 / Kubernetes集成是否可行?

I afraid that is impossible for Spark 2.3, if you using native Kubernetes support. 如果您使用本机Kubernetes支持,我担心Spark 2.3是不可能的。

Based on description from deployment instruction , submission process container several steps: 根据部署指令的描述,提交流程容器的几个步骤:

  1. Spark creates a Spark driver running within a Kubernetes pod. Spark创建一个在Kubernetes pod中运行的Spark驱动程序。
  2. The driver creates executors which are also running within Kubernetes pods and connects to them, and executes application code. 驱动程序创建执行程序,这些执行程序也在Kubernetes pod中运行并连接到它们,并执行应用程序代码。
  3. When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists logs and remains in “completed” state in the Kubernetes API until it's eventually garbage collected or manually cleaned up. 当应用程序完成时,执行程序窗格会终止并被清除,但驱动程序窗格会保留日志并在Kubernetes API中保持“已完成”状态,直到它最终被垃圾收集或手动清理。

So, in fact, you have no place to submit a job until you starting a submission process, which will launch a first Spark's pod (driver) for you. 所以,事实上,在你开始提交过程之前,你没有地方提交工作,这将为你启动第一个Spark的pod(驱动程序)。 And after application completes, everything terminated. 应用程序完成后,一切都终止了。

Because of running a fat container on AWS Lambda is not a best solution, and also because if is not way to run any commands in container itself (is is possible, but with hack, here is blueprint about executing Bash inside an AWS Lambda) the simplest way is to write some small custom service, which will work on machine outside of AWS Lambda and provide REST interface between your application and spark-submit utility. 因为在AWS Lambda上运行胖容器不是最好的解决方案,也因为如果不能在容器本身中运行任何命令(可能,但是有了hack,这里有关于在AWS Lambda中执行Bash的蓝图 )最简单的方法是编写一些小型自定义服务,它将在AWS Lambda之外的机器上运行,并在您的应用程序和spark-submit实用程序之间提供REST接口。 I don't see any other ways to make it without a pain. 我没有看到任何其他方法来做到没有痛苦。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM