[英]Spark log parser with Java using regex
I'm trying to create a Java parser for a Spark log created with Log4J. 我正在尝试为使用Log4J创建的Spark日志创建Java解析器。 I wrote this code to recognize a starting task log-line but it doesn't work and I can't figure out why.
我写了这段代码来识别启动任务日志行,但是它行不通,我也不知道为什么。
This is the regex: 这是正则表达式:
public static final String datePattern = "\\d{4}\\-\\d{2}\\-\\d{2}";
public static final String timePattern = "\\d{2}\\:\\d{2}\\:\\d{2}\\,\\d{3}";
public static final String timeStampPattern = "(?<timeStamp>" + datePattern + "\\s" + timePattern + ")";
public static final String logLevelPattern = "(?<logLevel>\\w+)";
public static final String loggingClassPattern = "(?<loggingClass>\\w+:)";
public static final String taskUIdPattern = "(?<UIdPattern>\\d+)";
public static final String taskIdPattern = "\\d.\\d:\\d+";
public static final String taskStatusPattern = null;
public static final String endTaskLabelPattern = null;
public static final String stringPatternStartTask = timeStampPattern +
" " + logLevelPattern +
" " + loggingClassPattern +
" " + "Starting task" +
" " + taskIdPattern +
" " + "as TID" +
" " + taskUIdPattern +
"\\z";
This is the parsing attempt: 这是解析尝试:
Pattern patternStartTask = Pattern.compile(stringPatternStartTask);
...
while((temp = br.readLine()) != null) {
if((m = patternStartTask.matcher(temp)).matches()) {
System.out.println(temp);
le = new StartTaskEvent();
}
...
if(m != null && le != null) {
le.setTaskId(m.group("taskId"));
le.setLogLevel(m.group("logLevel"));
le.setLoggingClass(m.group("loggingClass"));
le.setTimeStamp(sdf.parse(m.group("timeStamp")));
result.add(le);
}
}
The lines I'm trying to recognize are like this one: 我想识别的行是这样的:
2016-01-08 14:01:02 INFO TaskSetManager: Starting task 1.0:0 as TID 0 on executor 1
Your regex ends with: 您的正则表达式以:
" " + "as TID" +
" " + taskUIdPattern +
"\\z";
but in your string you have on executor 1
after taskUIdPattern
, you have to add on executor 1
or, better, on executor \\\\d
in your regex after taskUIdPattern
但是在您的字符串中
on executor 1
在taskUIdPattern
之后on executor 1
,您必须在taskUIdPattern
之后在正则表达式中添加on executor 1
或更好的on executor \\\\d
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.