简体   繁体   中英

How to forward logs to s3 from yarn container?

I am setting up Spark on Hadoop Yarn cluster in AWS EC2 machines. This cluster will be ephemeral (For few hours within a day) and hence i want to forward the container logs generated to s3. I have seen Amazon EMR supporting this feature by forwarding logs to s3 every 5 minutes

Is there any built in configuration inside hadoop/spark that i can leverage ..?

Any other solution to solve this issue will also be helpfull.

Sounds like you're looking for YARN log aggregation.

Haven't tried changing it myself, but you can configure yarn.nodemanager.remote-app-log-dir to point to S3 filesystem, assuming you've setup your core-site.xml accordingly

yarn.log-aggregation.retain-seconds + yarn.log-aggregation.retain-check-interval-seconds will determine how often the YARN containers will ship out their logs

The alternate solution would be to build your own AMI that has Fluentd or Filebeat pointing at the local YARN log directories, then setup those log forwarders to write to a remote location. For example, Elasticsearch (or one of the AWS log solutions) would be a better choice than just S3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM