简体   繁体   English

从日志文件中提取Regex直到序列

[英]Extract Regex until sequence from log file

i have the below log file, i need to define the log formate using regex so i can use it for extraction of logs entries. 我有以下日志文​​件,我需要使用正则表达式定义日志格式,所以我可以用它来提取日志条目。

_20131005_022047874 ALEPO@ALEPO3 **Exception ServiceConnection / createService methord javax.xml.ws.WebServiceException: Failed to access the WSDL at: http://212.118.158.21:8080/tunnel-web/axis/Portlet_ase_FunctionalDomainService?wsdl. It failed with: 
    Connection refused.
    at com.sun.xml.internal.ws.wsdl.parser.RuntimeWSDLParser.tryWithMex(RuntimeWSDLParser.java:151)
    at com.sun.xml.internal.ws.wsdl.parser.RuntimeWSDLParser.parse(RuntimeWSDLParser.java:133)
    at com.sun.xml.internal.ws.client.WSServiceDelegate.parseWSDL(WSServiceDelegate.java:254)
    at com.sun.xml.internal.ws.client.WSServiceDelegate.<init>(WSServiceDelegate.java:217)
    at com.sun.xml.internal.ws.client.WSServiceDelegate.<init>(WSServiceDelegate.java:165)
    at com.sun.xml.internal.ws.spi.ProviderImpl.createServiceDelegate(ProviderImpl.java:93)
    at javax.xml.ws.Service.<init>(Service.java:56)
    at javax.xml.ws.Service.create(Service.java:680)
    at com.stc.alepo.client.ServiceConnection.createService(ServiceConnection.java:75)
    at com.stc.alepo.client.WSSoapHandler.<init>(WSSoapHandler.java:73)
    at com.stc.alepo.client.WSProcessManager.<init>(WSProcessManager.java:114)
    at com.stc.alepo.client.IcmsAlepoRealTime.start(IcmsAlepoRealTime.java:439)
    at com.stc.alepo.client.IcmsAlepoRealTime.main(IcmsAlepoRealTime.java:97)
Caused by: java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
    at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
    at java.net.Socket.connect(Socket.java:529)
    at java.net.Socket.connect(Socket.java:478)
    at sun.net.NetworkClient.doConnect(NetworkClient.java:163)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
    at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
    at sun.net.www.http.HttpClient.New(HttpClient.java:306)
    at sun.net.www.http.HttpClient.New(HttpClient.java:323)
    at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
    at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1172)
    at java.net.URL.openStream(URL.java:1010)
    at com.sun.xml.internal.ws.wsdl.parser.RuntimeWSDLParser.createReader(RuntimeWSDLParser.java:793)
    at com.sun.xml.internal.ws.wsdl.parser.RuntimeWSDLParser.resolveWSDL(RuntimeWSDLParser.java:251)
    at com.sun.xml.internal.ws.wsdl.parser.RuntimeWSDLParser.parse(RuntimeWSDLParser.java:118)
    ... 11 more

_20131005_022047874 ALEPO@ALEPO3 **Exception DCPSoapHandler / constructor methord [Ljava.lang.StackTraceElement;@25b65b7f
_20131005_022047875 ALEPO@ALEPO3 WS17249866 **Exception DCPSoapHandler / invokeSOAPMessage methord java.lang.NullPointerException
    at com.stc.alepo.client.WSSoapHandler.invokeSOAPMessage(WSSoapHandler.java:110)
    at com.stc.alepo.client.WSProcessManager.getWSReply(WSProcessManager.java:174)
    at com.stc.alepo.client.IcmsAlepoRealTime.start(IcmsAlepoRealTime.java:441)
    at com.stc.alepo.client.IcmsAlepoRealTime.main(IcmsAlepoRealTime.java:97)

i have defined the below regex to match the time stamp in addition to the first line for each entry, but i need the second group to have the rest of the message including the multiline, 我已经定义了以下正则表达式以匹配每个条目的第一行之外的时间戳,但是我需要第二组包含多行的消息的其余部分,

(_\d{1,8}_\w+) (.*)

how to i match the second group to extract all characters until the first group occure again, or what is the best practice to do this use case. 如何匹配第二组以提取所有字符,直到第一组再次出现,或者执行此用例的最佳做法是什么。 i have many logs and i would need to define the second group the same way, wihile may be the timestamp formate will change over logs. 我有很多日志,我需要以相同的方式定义第二组,可能是时间戳格式将改变日志。

thanks in advance. 提前致谢。

You may use a regex that will capture the time stamp into 1 group and all lines after it that do not start with the time stamp pattern into Group 2: 您可以使用将时间戳捕获到1组中的正则表达式以及不以时间戳模式开始到第2组中的所有后续行:

/^(_\d{1,8}_\w+)\s*(.*(?:\r?\n(?!_\d{1,8}_\w+).*)*)/gm

See the regex demo . 请参阅正则表达式演示

Details : 细节

  • ^ - start of a line ^ - 开始一行
  • (_\\d{1,8}_\\w+) - Group 1 (timestamp): _ , 1 to 8 digits, _ and 1+ word chars (_\\d{1,8}_\\w+) - 组1(时间戳): _ ,1到8位, _和1+个字符
  • \\s* - 0+ whitespaces \\s* - 0+空格
  • (.*(?:\\r?\\n(?!_\\d{1,8}_\\w+).*)*) - Group 2 (all up to the next time stamp): (.*(?:\\r?\\n(?!_\\d{1,8}_\\w+).*)*) - 第2组(直到下一个时间戳):
    • .* - any 0+ chars other than line break chars .* - 除了换行符之外的任何0+字符
    • (?:\\r?\\n(?!_\\d{1,8}_\\w+).*)* - 0+ sequences of: (?:\\r?\\n(?!_\\d{1,8}_\\w+).*)* - 0+序列:
      • \\r?\\n(?!_\\d{1,8}_\\w+) - a linebreak not followed with the timestamp pattern \\r?\\n(?!_\\d{1,8}_\\w+) - 没有后跟时间戳模式的换行符
      • .* - any 0+ chars other than line break chars .* - 除了换行符之外的任何0+字符

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM