[英]Spark submit (2.3) on kubernetes cluster from Python
So now that k8s is integrated directly with spark in 2.3 my spark submit from the console executes correctly on a kuberenetes master without any spark master pods running, spark handles all the k8s details: 所以现在k8s直接与spark集成在一起2.3我从控制台提交的火花正确地在kuberenetes主机上执行而没有任何火花主机吊舱运行,火花处理所有k8s细节:
spark-submit \
--deploy-mode cluster \
--class com.app.myApp \
--master k8s://https://myCluster.com \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.app.name=myApp \
--conf spark.executor.instances=10 \
--conf spark.kubernetes.container.image=myImage \
local:///myJar.jar
What I am trying to do is do a spark-submit via AWS lambda to my k8s cluster. 我想要做的是通过AWS lambda向我的k8s集群做一个spark-submit。 Previously I used the command via the spark master REST API directly (without kubernetes):
以前我通过spark master REST API直接使用该命令(没有kubernetes):
request = requests.Request(
'POST',
"http://<master-ip>:6066/v1/submissions/create",
data=json.dumps(parameters))
prepared = request.prepare()
session = requests.Session()
response = session.send(prepared)
And it worked. 它奏效了。 Now I want to integrate Kubernetes and do it similarly where I submit an API request to my kubernetes cluster from python and have spark handle all the k8s details, ideally something like:
现在我想集成Kubernetes,并且类似地在我从python向我的kubernetes集群提交API请求并且使用spark来处理所有k8s细节,理想情况如下:
request = requests.Request(
'POST',
"k8s://https://myK8scluster.com:443",
data=json.dumps(parameters))
Is it possible in the Spark 2.3/Kubernetes integration? Spark 2.3 / Kubernetes集成是否可行?
I afraid that is impossible for Spark 2.3, if you using native Kubernetes support. 如果您使用本机Kubernetes支持,我担心Spark 2.3是不可能的。
Based on description from deployment instruction , submission process container several steps: 根据部署指令的描述,提交流程容器的几个步骤:
So, in fact, you have no place to submit a job until you starting a submission process, which will launch a first Spark's pod (driver) for you. 所以,事实上,在你开始提交过程之前,你没有地方提交工作,这将为你启动第一个Spark的pod(驱动程序)。 And after application completes, everything terminated.
应用程序完成后,一切都终止了。
Because of running a fat container on AWS Lambda is not a best solution, and also because if is not way to run any commands in container itself (is is possible, but with hack, here is blueprint about executing Bash inside an AWS Lambda) the simplest way is to write some small custom service, which will work on machine outside of AWS Lambda and provide REST interface between your application and spark-submit
utility. 因为在AWS Lambda上运行胖容器不是最好的解决方案,也因为如果不能在容器本身中运行任何命令(可能,但是有了hack,这里有关于在AWS Lambda中执行Bash的蓝图 )最简单的方法是编写一些小型自定义服务,它将在AWS Lambda之外的机器上运行,并在您的应用程序和
spark-submit
实用程序之间提供REST接口。 I don't see any other ways to make it without a pain. 我没有看到任何其他方法来做到没有痛苦。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.