简体   繁体   English

将 Kubernetes 中的 Spark Driver 和 Worker 标准输出 stderr 日志暴露给历史服务器

[英]Exposing Spark Driver and Worker stdout stderr logs in Kubernetes to History Server

I'm using Spark 3.0.0 with Kubernetes master.我将 Spark 3.0.0 与 Kubernetes 主机一起使用。 I am using a cluster mode to rung the spark job.我正在使用集群模式来运行 spark 作业。 Please find the spark submit command as below请找到如下火花提交命令

./spark-submit \
--master=k8s://https://api.k8s.my-domain.com \
--deploy-mode cluster \
--name sparkle \
--num-executors 2 \
--executor-cores 2 \
--executor-memory 2g \
--driver-memory 2g \
--class com.myorg.sparkle.Sparkle \
--conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/opt/spark/conf/log4j.properties \
--conf spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/opt/spark/conf/log4j.properties \
--conf spark.kubernetes.submission.waitAppCompletion=false \
--conf spark.kubernetes.allocation.batch.delay=10s \
--conf spark.kubernetes.appKillPodDeletionGracePeriod=20s \
--conf spark.kubernetes.node.selector.workloadType=spark \
--conf spark.kubernetes.driver.pod.name=sparkle-driver \
--conf spark.kubernetes.container.image=custom-registry/spark:latest \
--conf spark.kubernetes.namespace=spark \
--conf spark.eventLog.dir='s3a://my-bucket/spark-logs' \
--conf spark.history.fs.logDirectory='s3a://my-bucket/spark-logs' \
--conf spark.eventLog.enabled='true' \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.authenticate.executor.serviceAccountName=spark \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.WebIdentityTokenCredentialsProvider \
--conf spark.kubernetes.driver.annotation.iam.amazonaws.com/role=K8sRoleSpark \
--conf spark.kubernetes.executor.annotation.iam.amazonaws.com/role=K8sRoleSpark \
--conf spark.kubernetes.driver.secretKeyRef.AWS_ACCESS_KEY_ID=aws-secrets:key \
--conf spark.kubernetes.driver.secretKeyRef.AWS_SECRET_ACCESS_KEY=aws-secrets:secret \
--conf spark.kubernetes.executor.secretKeyRef.AWS_ACCESS_KEY_ID=aws-secrets:key \
--conf spark.kubernetes.executor.secretKeyRef.AWS_SECRET_ACCESS_KEY=aws-secrets:secret \
--conf spark.hadoop.fs.s3a.endpoint=s3.ap-south-1.amazonaws.com \
--conf spark.hadoop.com.amazonaws.services.s3.enableV4=true \
--conf spark.yarn.maxAppAttempts=4 \
--conf spark.yarn.am.attemptFailuresValidityInterval=1h \
s3a://dp-spark-jobs/sparkle/jars/sparkle.jar \
--commonConfigPath https://my-bucket.s3.ap-south-1.amazonaws.com/sparkle/configs/prod_main_configs.yaml \
--jobConfigPath https://my-bucket.s3.ap-south-1.amazonaws.com/sparkle/configs/cc_configs.yaml \
--filePathDate 2021-03-29 20000

I have hosted a different pod running history server with the same image.我使用相同的图像托管了一个不同的 pod 运行历史服务器。 The history server is able to read all the event logs and shows details.历史服务器能够读取所有事件日志并显示详细信息。 The job is executed successfully.作业成功执行。

I do not see driver pod and worker pod's stdout and stderr logs in History server.我在历史服务器中看不到驱动程序 pod 和工作程序 pod 的 stdout 和 stderr 日志。 How can I enable it?我该如何启用它?

Similar to this question类似于这个问题

Unfortunately, it appears that there is no way for the driver-scoped logs to be piped to the spark-submit scope.不幸的是,似乎无法将驱动程序范围的日志通过管道传输到 spark-submit scope。 From the docs :文档

Logs can be accessed using the Kubernetes API and the kubectl CLI.可以使用 Kubernetes API 和 kubectl CLI 访问日志。 When a Spark application is running, it's possible to stream logs from the application using:当 Spark 应用程序运行时,可以使用 stream 从应用程序获取日志:

kubectl -n=<namespace> logs -f <driver-pod-name>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当驱动程序在主节点上运行时,Spark History Server 非常慢 - Spark History Server very slow when driver running on master node 是否可以在启用 spark.eventLog.compress 的情况下查看 spark 历史服务器中的日志? - Is it possible to view logs in the spark history server with spark.eventLog.compress enabled? 使用 Kubernetes (EKS) 时如何将 Spark 日志写入 S3? 错误:无法识别的选项:--spark.kubernetes.driver.secretKeyRef.AWS_ACCESS_KEY_ID - How to write Spark logs to S3 when using Kubernetes (EKS)? Error: Unrecognized option: --spark.kubernetes.driver.secretKeyRef.AWS_ACCESS_KEY_ID Spark:驱动程序/工作程序配置。 驱动程序是否在主节点上运行? - Spark: driver/worker configuration. Does driver run on Master node? Spark History Server ListBucket 成本 - Spark History Server ListBucket costs Kubernetes nginx-ingress 没有暴露服务 - Kubernetes nginx-ingress isnt exposing the services 无法 ssh 进入 Spark 工作者 - Cannot ssh into Spark worker 如何检索Elastic Beanstalk部署的合理stdout / stderr日志记录 - How to retrieve sane stdout/stderr logging for Elastic Beanstalk deployment 在 kubernetes 中存储来自应用程序的日志 - Store logs from applications in kubernetes Logstash docker,如何停止将日志发送到标准输出 - Logstash docker, how to stop send logs to stdout
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM