简体   繁体   English

正则表达式解析日志文件并查找堆栈跟踪

[英]Regular expression to parse a log file and find stacktraces

I'm working with a legacy Java app that has no logging and just prints all information to the console. 我正在使用没有日志记录的旧Java应用程序,只是将所有信息打印到控制台。 Most exceptions are also "handled" by just doing a printStackTrace() call. 通过执行printStackTrace()调用也可以“处理”大多数异常。

In a nutshell, I've just redirected the System.out and System.error streams to a log file, and now I need to parse that log file. 简而言之,我只是将System.out和System.error流重定向到日志文件,现在我需要解析该日志文件。 So far all good, but I'm having problems trying to parse the log file for stack traces. 到目前为止一切都很好,但我在尝试解析堆栈跟踪的日志文件时遇到问题。

Some of the code is obscufated as well, so I need to run the stacktraces through a utility app to de-obscufate them. 一些代码也被遮挡了,所以我需要通过实用程序应用程序运行堆栈跟踪来去除它们。 I'm trying to automate all of this. 我正在尝试自动完成所有这些。

The closest I've come so far is to get the initial Exception line using this: 我到目前为止最接近的是使用以下方法获取最初的Exception行:

.+Exception[^\n]+

And finding the "at ..(..)" lines using: 并使用以下方法找到“at ..(..)”行:

(\t+\Qat \E.+\s+)+

But I can't figure out how to put them together to get the full stacktrace. 但我无法弄清楚如何将它们组合在一起以获得完整的堆栈跟踪。

Basically, the log files looks something like the following. 基本上,日志文件看起来如下所示。 There is no fixed structure and the lines before and after stack traces are completely random: 没有固定的结构,堆栈跟踪之前和之后的行是完全随机的:

Modem ERROR (AT
Owner: CoreTalk
) - TIMEOUT
IN []
Try Open: COM3


javax.comm.PortInUseException: Port currently owned by CoreTalk
    at javax.comm.CommPortIdentifier.open(CommPortIdentifier.java:337)
...
    at UniPort.modemService.run(modemService.java:103)
Handling file: C:\Program Files\BackBone Technologies\CoreTalk 2006\InputXML\notify
java.io.FileNotFoundException: C:\Program Files\BackBone Technologies\CoreTalk 2006\InputXML\notify (The system cannot find the file specified)
    at java.io.FileInputStream.open(Native Method)
...
    at com.gobackbone.Store.a.a.handle(Unknown Source)
    at com.jniwrapper.win32.io.FileSystemWatcher.fireFileSystemEvent(FileSystemWatcher.java:223)
...
    at java.lang.Thread.run(Unknown Source)
Load Additional Ports
... Lots of random stuff
IN []

[Fatal Error] .xml:6:114: The entity name must immediately follow the '&' in the entity reference.
org.xml.sax.SAXParseException: The entity name must immediately follow the '&' in the entity reference.
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
...
    at com.gobackbone.Store.a.a.run(Unknown Source)

Looks like you just need to paste them together (and use a newline as glue): 看起来你只需要将它们粘贴在一起(并使用换行符作为粘合剂):

.+Exception[^\n]+\n(\t+\Qat \E.+\s+)+

But I would change your regex a bit: 但我会改变你的正则表达式:

^.+Exception[^\n]++(\s+at .++)+

This combines the whitespace between the at... lines and uses possessive quantifiers to avoid backtracking. 这结合了at...行之间的空白,并使用占有量词来避免回溯。

We have been using ANTLR to tackle the parsing of logfiles (in a different application area). 我们一直在使用ANTLR来解决日志文件的解析(在不同的应用程序区域)。 It's not trivial but if this is a critical task for you it will be better than using regexes. 这不是微不足道的,但如果这对你来说是一项关键任务,那么它将比使用正则表达式更好。

I get good results using 我用得很好

perl -n -e 'm/(Exception)|(\tat )/ && print' /var/log/jboss4.2/debian/server.log 

It dumps all lines which have Exception or \\tat in them. 它会转储所有包含Exception或\\ tat的行。 Since the match is in the same time the order is kept. 由于匹配是在保持订单的同时。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM