简体   繁体   English

为 AWS EKS 配置 Apache Spark

[英]Configure Apache Spark for AWS EKS

I'd like to start working with apache spark on k8s but I don't have experience with it.我想开始在 k8s 上使用 apache spark,但我没有这方面的经验。 I installed Spark via Helm chart with ServiceType "LoadBalancer".我通过服务类型为“LoadBalancer”的 Helm chart 安装了 Spark。

spark-submit --master 'spark://LOADBALANCER.elb.eu-central-1.amazonaws.com:7077' \ 
--deploy-mode client \
--conf spark.kubernetes.container.image='MY_IMAGE' test.py

This is my test code test.py这是我的测试代码test.py

from pyspark.sql import SparkSession

spark_session = SparkSession.builder \
    .getOrCreate()
l = [('Alice', 1)]
spark_session.createDataFrame(l).show()

Running locally on microk8s cluster works but the same way on AWS EKS cluster fails with following endless log warning在 microk8s 集群上本地运行有效,但在 AWS EKS 集群上以相同的方式运行失败,并出现无休止的日志警告

22/02/16 17:36:01 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 0
22/02/16 17:36:16 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Is there a way to develop the user code and run it against the kube.netes cluster or should I create a new docker image everytime?有没有办法开发用户代码并针对 kube.netes 集群运行它,或者我应该每次都创建一个新的 docker 图像? Maybe there are some best practices for Apache Spark on EKS?也许 EKS 上的 Apache Spark 有一些最佳实践?

Try to change to --deploy-mode cluster尝试更改为--deploy-mode cluster

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM