Good day,
I am running a Flink (v1.7.1) streaming job on AWS EMR 5.20, and I would like to have all task_managers and job_manager's logs of my job in S3. Logback is used as recommended by the Flink team. As it is a long-running job, I want the logs to be:
What I have tried are:
<appender name="ROLLING" class="ch.qos.logback.core.rolling.RollingFileAppender"> <file>${log.file}</file> <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy"> <fileNamePattern>%d{yyyy-MM-dd HH}.%i.log</fileNamePattern> <maxFileSize>30MB</maxFileSize> <maxHistory>3</maxHistory> <totalSizeCap>50MB</totalSizeCap> </rollingPolicy> <encoder> <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{60} %X{sourceThread} - %msg%n</pattern> </encoder> </appender>
What I got/observed until now are:
In short, out of the 3 requirements I got, I could only either (1) or (2&3).
Could you please help me with this?
Thanks and best regards,
Averell
From what I know, the auto-backup of logs to S3 that EMR supports will only work at the end of the job, since it's based on the background log-loader that was originally implemented by AWS for batch jobs. Maybe there's a way to get it to work for rolling logs, I just have never heard about it.
I haven't tried this myself, but if I had to then I'd probably try the following:
S3fs
. logrotate
(or equivalent) to automatically copy and clean up the log files. You can use a bootstrap action to automatically set up all of the above.
If S3fs
gives you problems, then you can do a bit more scripting and directly use the aws s3
command to sync logs, and then remove them once they've been copied.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.