简体   繁体   中英

Spark history-server stderr and stdout logs location when working on S3

I deployed spark-history server that supposes to serve multi-environments, all spark clusters will write to one bucket and the history-server will read from that bucket.

I got everything working and set up but when I try to access the stdout/stderr of a certain task it's addressing the private IP of the worker that the task was running on (eg- http://10.192.21.80:8081/logPage/?appId=app-20220510103043-0001&executorId=1&logType=stderr ).

I want to access those logs from the UI, but of course, there is no access to those internal IP's (private su.nets and private IP's), Isn't there a way to also upload those stderr/stdout logs to the bucket and then access it from the history server UI?

I couldn't find anything in the documentation

在此处输入图像描述

I am assuming that you are referring to running spark jobs AWS EMR here.

If you have enabled logging to a s3 bucket on your cluster [1] all the applications logs are logged in the s3 bucket path specified, while launching the cluster.

You should find the logs in the following path:

s3://<bucketpath>/<clusterid>/containers/<applicationId>/<containerId>/stdout.gz

s3://<bucketpath>/<clusterid>/containers/<applicationId>/<containerId>/stderr.gz

Hope the above information was helpful!

References:

[1] Configure cluster logging and debugging - Enable the debugging tool - https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-debugging.html#emr-plan-debugging-logs-archive-debug

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM