简体   繁体   中英

Spark Streaming Standalone : Save logs

I am running a spark-streaming application on a standalone setup (version 1.6.1). When I run the application using spark-submit, the logs show up on the terminal. These are useful later on, for instance, to understand what the reason for the failure of the app was, if it failed.

From what I read over the documentation, I have spark.eventLog.enabled flag set to true. But, this only saves the event logs to tmp/spark-events folder. These logs are of not much use to me , as I see it. My jobs fail often, due to many exceptions. What is the correct way to store these logs which show up in the terminal (I am guessing, the driver logs?) and analyse my exceptions?

If you only want to log events that happen on the driver side, the easiest way is to provide spark with a logging configuration file. By default, spark uses log4j to do the logging, so when starting your spark-submit job, you can use the spark.driver.extraJavaOptions flag to pass in your log4j configuration, and add a RollingFileAppender to it:

spark.driver.extraJavaOptions=-Dlog4j.configuration=/path/to/your/log4j.xml

This is a basic pattern for log4j-1.2 rolling appender xml:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration>
    <appender name="RollingFileAppender" type="log4net.Appender.RollingFileAppender">
        <file value="log.txt" />
        <appendToFile value="true" />
        <rollingStyle value="Size" />
        <maxSizeRollBackups value="10" />
        <maximumFileSize value="100KB" />
        <staticLogFileName value="true" />
        <layout type="log4net.Layout.PatternLayout">
            <conversionPattern value="%date [%thread] %-5level %logger [%property{NDC}] - %message%newline" />
        </layout>
    </appender>
    <logger name="log4j.rootLogger" additivity="false">
    <level value="DEBUG"/>
    <appender-ref ref="RollingFileAppender"/>
</logger>
</log4j:configuration>

You can find more information in the Spark Configuration section of the spark documentation.

If you want to also log events that happen on the driver nodes, then I suggest you look into an external service which can collection logs from distributed systems.

I added these lines to SPARK_HOME/conf/log4j.properties

log4j.rootLogger=ERROR, file
log4j.appender.file=org.apache.log4j.FileAppender
log4j.appender.file.File=/tmp/application.log
log4j.appender.file.append=false
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

Any log4j appender can be used. I am using file appender here.

PS Still, the stdout logs are missing. Only stderr logs are being saved. Yet to figure out why.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM