简体   繁体   中英

How can I prune executors' logs in spark streaming

I'm working on a spark streaming job which runs on standalone mode. The executors by default append the logs in $SPARK_HOME/work/app_idxxxx/stderr and stdout files. Now the problem comes when app runs for a long time say a month or more and it generates a lot of logs inside stderr file. I would like to rollup the stderr daily for a week and archive(delete) that after that. I changed the log4j.properties with org.apache.log4j.RollingFileAppender and directed the logs to a file instead of stderr but the file doesn't respect the rolling and keeps growing. Creating a cron job to do that is also not working since spark has a pointer to that specific file and changing the name probably not working.

I could't find any documentations for these specific logs. I really appreciate for any help.

After digging more, I finally found how to resolve the issue and I post it here so that the next person don't go through all this suffer and trial/error. The setting for those logs are in two different places. One in $SPARK_HOME/conf/spark-default.conf add these three lines below in each executor :

spark.executor.logs.rolling.time.interval  daily
spark.executor.logs.rolling.strategy  time
spark.executor.logs.rolling.maxRetainedFiles  7

The other file that you need to change in each executor is $SPARK_HOME/conf/spark-env.sh add the following line:

SPARK_WORKER_OPTS="$SPARK_WORKER_OPTS -Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=1800 
 -Dspark.worker.cleanup.appDataTtl=864000 
-Dspark.executor.logs.rolling.strategy=time 
-Dspark.executor.logs.rolling.time.interval=daily 
-Dspark.executor.logs.rolling.maxRetainedFiles=7 "

export SPARK_WORKER_OPTS

After these changes it started working properly. Hope this helps some people:)

if you are in standalone mode, just export an environment is enough:

export SPARK_WORKER_OPTS="-Dspark.executor.logs.rolling.strategy=time -Dspark.executor.logs.rolling.time.interval=daily -Dspark.executor.logs.rolling.maxRetainedFiles=7"

you can also refer to: http://apache-spark-user-list.1001560.n3.nabble.com/Executor-Log-Rotation-Is-Not-Working-td18024.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM