简体   繁体   中英

Spark log parser with Java using regex

I'm trying to create a Java parser for a Spark log created with Log4J. I wrote this code to recognize a starting task log-line but it doesn't work and I can't figure out why.

This is the regex:

public static final String datePattern = "\\d{4}\\-\\d{2}\\-\\d{2}";
public static final String timePattern = "\\d{2}\\:\\d{2}\\:\\d{2}\\,\\d{3}";
public static final String timeStampPattern = "(?<timeStamp>" + datePattern + "\\s" + timePattern + ")";
public static final String logLevelPattern = "(?<logLevel>\\w+)";
public static final String loggingClassPattern = "(?<loggingClass>\\w+:)";
public static final String taskUIdPattern = "(?<UIdPattern>\\d+)";
public static final String taskIdPattern = "\\d.\\d:\\d+";
public static final String taskStatusPattern = null;
public static final String endTaskLabelPattern = null;
public static final String stringPatternStartTask = timeStampPattern + 
        " " + logLevelPattern + 
        " " + loggingClassPattern + 
        " " + "Starting task" +
        " " + taskIdPattern +
        " " + "as TID" +
        " " + taskUIdPattern +
        "\\z";

This is the parsing attempt:

Pattern patternStartTask = Pattern.compile(stringPatternStartTask);
...
while((temp = br.readLine()) != null) {
if((m = patternStartTask.matcher(temp)).matches()) {
    System.out.println(temp);
    le = new StartTaskEvent();
}
...
if(m != null && le != null) {
    le.setTaskId(m.group("taskId"));
    le.setLogLevel(m.group("logLevel"));
    le.setLoggingClass(m.group("loggingClass"));
    le.setTimeStamp(sdf.parse(m.group("timeStamp")));
    result.add(le);
}
}

The lines I'm trying to recognize are like this one:

2016-01-08 14:01:02 INFO TaskSetManager: Starting task 1.0:0 as TID 0 on executor 1

Your regex ends with:

    " " + "as TID" +
    " " + taskUIdPattern +
    "\\z";

but in your string you have on executor 1 after taskUIdPattern , you have to add on executor 1 or, better, on executor \\\\d in your regex after taskUIdPattern

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM