简体   繁体   English

使用正则表达式拆分日志文件

[英]Splitting logfile with Regular Expressions

I got lines like this one on a log file but I am having problem with my regular expressions. 我在日志文件中遇到了类似这样的行,但是我的正则表达式有问题。 127.0.0.1 192.168.1.1 1050 1050 127.0.0.1 - GET 8080 ?action=edit&studentId=1 - [24/May/2016:19:33:52 +0300] "GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1" 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36" "http://127.0.0.1:8080/CRUDProject/StudentController.do"

Here is my code in a Netbeans project : 这是我在Netbeans项目中的代码:

public class LogRegExp1 {

public static void main(String argv[]) {
    FileReader myFile = null;
    BufferedReader buff = null;

    String logEntryPattern = "^([\\d.]+|[\\d:]+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) ([\\d]+) [a-zA-Z0-9_ ]*(\\S+) [-]?[ ]?\\[([\\w:/] +\\s[+\\-]\\d{4})\\] \\\"(.+?)\\\" (\\d{3}) (\\S+) ([\\d]+) (\\S+) \"(.+?)\\\" \"(.+?)\\\"";  
    System.out.println("Using RE Pattern:");
    System.out.println(logEntryPattern);

    Pattern p = Pattern.compile(logEntryPattern);

    try {
        myFile = new FileReader("e3600_access_log2016-05-24.log");
        buff = new BufferedReader(myFile);

        while (true) {
            String line = buff.readLine();
            if (line == null) {
                break;
            }

            Matcher matcher = p.matcher(line);
            System.out.println("groups: " + matcher.groupCount());
            if (!matcher.matches()) {
                System.err.println(line + matcher.toString());
                return;
            }

            System.out.println("%a Remote IP Address     : " + matcher.group(1));}
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            buff.close();
            myFile.close();
        } catch (IOException e) {
            e.printStackTrace();
        }}}}`

As a result I get this : 结果我得到这个:

Using RE Pattern:
^([\d.]+|[\d:]+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) ([\d]+) [a-zA-Z0-9_ ]*(\S+) [-]?[ ]?\[([\w:/] +\s[+\-]\d{4})\] \"(.+?)\" (\d{3}) (\S+) ([\d]+) (\S+) "(.+?)\" "(.+?)\"
groups: 17
127.0.0.1 192.168.1.66 1050 1050 127.0.0.1 - GET 8080 ?action=edit&studentId=1 - [24/May/2016:19:33:52 +0300] "GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1" 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"  "http://127.0.0.1:8080/CRUDProject/StudentController.do"java.util.regex.Matcher[pattern=^([\d.]+|[\d:]+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) ([\d]+) [a-zA-Z0-9_ ]*(\S+) [-]?[ ]?\[([\w:/] +\s[+\-]\d{4})\] \"(.+?)\" (\d{3}) (\S+) ([\d]+) (\S+) "(.+?)\" "(.+?)\" region=0,427 lastmatch=]`

All help is apreciated on how and what I am doing wrong and should fix so I can get the results I should. 所有帮助都取决于我在做什么以及做错了什么,应该予以解决,这样我才能得到应有的结果。 Thanks 谢谢

Your pattern does not match the log entries. 您的模式与日志条目不匹配。 Use a tool like http://regexr.com/ to debug regexes. 使用http://regexr.com/之类的工具来调试正则表达式。

This modified pattern matches your sample input: 此修改后的模式与您的样本输入匹配:

^([\d.]+|[\d:]+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) ([\d]+) [a-zA-Z0-9_ ]*(\S+) [-]?[ ]?\[([\w:/]+ [+\-]\d{4})\] \"(.+?)\" (\d{3}) (\S+) ([\d]+) (\S+) "(.+?)\"  "(.+?)\"

That will probably not solve all your problems, it still looks flaky. 那可能无法解决您的所有问题,但看起来仍然很脆弱。 Test some more and adapt the pattern. 进行更多测试并调整模式。

Description 描述

This regular expression will do the following: 此正则表达式将执行以下操作:

  • match all the substrings in your log message 匹配日志消息中的所有子字符串
  • place each matched substring in its own capture group 将每个匹配的子字符串放在其自己的捕获组中

Note: to use this regex in java, you'll need to replace all the \\ with \\\\ . 注意:要在Java中使用此正则表达式,您需要将所有\\替换为\\\\ I've also left the expressions that match each substring on their own lines. 我还将与每个子字符串匹配的表达式留在了自己的行上。 If you use this expression in this format you'll need to include the Ignore White Space flag, or simply make the expression a single line. 如果您以这种格式使用此表达式,则需要包括“忽略空白”标志,或仅使表达式成为一行。 Keep in mind this expression does not do an exhaustive validation on the date or ip address substrings. 请记住,此表达式不会对日期或IP地址子字符串进行详尽的验证。

^
((?:[0-9]{1,3}\.){3}[0-9]{1,3})\s+
((?:[0-9]{1,3}\.){3}[0-9]{1,3})\s+
([0-9]+)\s+
([0-9]+)\s+
((?:[0-9]{1,3}\.){3}[0-9]{1,3})\s+
-\s+
([a-z]+\s[0-9]+)\s+
(\?[^\s]+)\s+
-\s+
\[([0-9]{1,2}\/(?:Jan|feb|Mar|apr|may|Jun|July|Aug|Sep|Oct|Nov|Dec)\/[0-9]{4}(?::[0-9]{2}){3}\s+\+[0-9]{4})\]\s+
"([^"]+)"\s+
([0-9]+)\s+
([^\s]+)\s+
([0-9]+)\s+
([0-9a-f]+)\s+
"([^"]+)"\s+
"([^"]+)"

正则表达式可视化

To see the image better, you can right click the image and select open in new window. 要更好地查看图像,可以右键单击图像,然后选择在新窗口中打开。

Example

Live Demo 现场演示

https://regex101.com/r/mX7gG2/1 https://regex101.com/r/mX7gG2/1

Sample text 示范文本

127.0.0.1 192.168.1.1 1050 1050 127.0.0.1 - GET 8080 ?action=edit&studentId=1 - [24/May/2016:19:33:52 +0300] "GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1" 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36" " http://127.0.0.1:8080/CRUDProject/StudentController.do " 127.0.0.1 192.168.1.1 1050 1050 127.0.0.1-GET 8080?action = edit&studentId = 1-[24 / May / 2016:19:33:52 +0300]“ GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP / 1.1“ 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226” Mozilla / 5.0(Windows NT 6.1; WOW64)AppleWebKit / 537.36(KHTML,如Gecko)Chrome / 50.0.2661.102 Safari / 537.36“” http://127.0.0.1 :8080 / CRUDProject / StudentController.do

Sample Matches 比赛样本

[0][0] = 127.0.0.1 192.168.1.1 1050 1050 127.0.0.1 - GET 8080 ?action=edit&studentId=1 - [24/May/2016:19:33:52 +0300] "GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1" 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"  "http://127.0.0.1:8080/CRUDProject/StudentController.do"
[0][1] = 127.0.0.1
[0][2] = 192.168.1.1
[0][3] = 1050
[0][4] = 1050
[0][5] = 127.0.0.1
[0][6] = GET 8080
[0][7] = ?action=edit&studentId=1
[0][8] = 24/May/2016:19:33:52 +0300
[0][9] = GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1
[0][10] = 200
[0][11] = /CRUDProject/StudentController.do
[0][12] = 264
[0][13] = ABADDD8AFB03ECC4791D76E543290226
[0][14] = Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
[0][15] = http://127.0.0.1:8080/CRUDProject/StudentController.do

Explanation 说明

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of a "line"
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    (?:                      group, but do not capture (3 times):
----------------------------------------------------------------------
      [0-9]{1,3}               any character of: '0' to '9' (between
                               1 and 3 times (matching the most
                               amount possible))
----------------------------------------------------------------------
      \.                       '.'
----------------------------------------------------------------------
    ){3}                     end of grouping
----------------------------------------------------------------------
    [0-9]{1,3}               any character of: '0' to '9' (between 1
                             and 3 times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    (?:                      group, but do not capture (3 times):
----------------------------------------------------------------------
      [0-9]{1,3}               any character of: '0' to '9' (between
                               1 and 3 times (matching the most
                               amount possible))
----------------------------------------------------------------------
      \.                       '.'
----------------------------------------------------------------------
    ){3}                     end of grouping
----------------------------------------------------------------------
    [0-9]{1,3}               any character of: '0' to '9' (between 1
                             and 3 times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \4:
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \4
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \5:
----------------------------------------------------------------------
    (?:                      group, but do not capture (3 times):
----------------------------------------------------------------------
      [0-9]{1,3}               any character of: '0' to '9' (between
                               1 and 3 times (matching the most
                               amount possible))
----------------------------------------------------------------------
      \.                       '.'
----------------------------------------------------------------------
    ){3}                     end of grouping
----------------------------------------------------------------------
    [0-9]{1,3}               any character of: '0' to '9' (between 1
                             and 3 times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \5
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  -                        '-'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \6:
----------------------------------------------------------------------
    [a-z]+                   any character of: 'a' to 'z' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \6
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \7:
----------------------------------------------------------------------
    \?                       '?'
----------------------------------------------------------------------
    [^\s]+                   any character except: whitespace (\n,
                             \r, \t, \f, and " ") (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \7
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  -                        '-'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  \[                       '['
----------------------------------------------------------------------
  (                        group and capture to \8:
----------------------------------------------------------------------
    [0-9]{1,2}               any character of: '0' to '9' (between 1
                             and 2 times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \/                       '/'
----------------------------------------------------------------------
    (?:                      group, but do not capture:
----------------------------------------------------------------------
      Jan                      'Jan'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      feb                      'feb'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Mar                      'Mar'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      apr                      'apr'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      may                      'may'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Jun                      'Jun'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      July                     'July'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Aug                      'Aug'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Sep                      'Sep'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Oct                      'Oct'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Nov                      'Nov'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Dec                      'Dec'
----------------------------------------------------------------------
    )                        end of grouping
----------------------------------------------------------------------
    \/                       '/'
----------------------------------------------------------------------
    [0-9]{4}                 any character of: '0' to '9' (4 times)
----------------------------------------------------------------------
    (?:                      group, but do not capture (3 times):
----------------------------------------------------------------------
      :                        ':'
----------------------------------------------------------------------
      [0-9]{2}                 any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
    ){3}                     end of grouping
----------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \+                       '+'
----------------------------------------------------------------------
    [0-9]{4}                 any character of: '0' to '9' (4 times)
----------------------------------------------------------------------
  )                        end of \8
----------------------------------------------------------------------
  \]                       ']'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  (                        group and capture to \9:
----------------------------------------------------------------------
    [^"]+                    any character except: '"' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \9
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \10:
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \10
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \11:
----------------------------------------------------------------------
    [^\s]+                   any character except: whitespace (\n,
                             \r, \t, \f, and " ") (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \11
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \12:
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \12
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \13:
----------------------------------------------------------------------
    [0-9a-f]+                any character of: '0' to '9', 'a' to 'f'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
  )                        end of \13
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  (                        group and capture to \14:
----------------------------------------------------------------------
    [^"]+                    any character except: '"' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \14
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  (                        group and capture to \15:
----------------------------------------------------------------------
    [^"]+                    any character except: '"' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \15
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM