簡體   English   中英

使用正則表達式拆分日志文件

[英]Splitting logfile with Regular Expressions

我在日志文件中遇到了類似這樣的行,但是我的正則表達式有問題。 127.0.0.1 192.168.1.1 1050 1050 127.0.0.1 - GET 8080 ?action=edit&studentId=1 - [24/May/2016:19:33:52 +0300] "GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1" 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36" "http://127.0.0.1:8080/CRUDProject/StudentController.do"

這是我在Netbeans項目中的代碼:

public class LogRegExp1 {

public static void main(String argv[]) {
    FileReader myFile = null;
    BufferedReader buff = null;

    String logEntryPattern = "^([\\d.]+|[\\d:]+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) ([\\d]+) [a-zA-Z0-9_ ]*(\\S+) [-]?[ ]?\\[([\\w:/] +\\s[+\\-]\\d{4})\\] \\\"(.+?)\\\" (\\d{3}) (\\S+) ([\\d]+) (\\S+) \"(.+?)\\\" \"(.+?)\\\"";  
    System.out.println("Using RE Pattern:");
    System.out.println(logEntryPattern);

    Pattern p = Pattern.compile(logEntryPattern);

    try {
        myFile = new FileReader("e3600_access_log2016-05-24.log");
        buff = new BufferedReader(myFile);

        while (true) {
            String line = buff.readLine();
            if (line == null) {
                break;
            }

            Matcher matcher = p.matcher(line);
            System.out.println("groups: " + matcher.groupCount());
            if (!matcher.matches()) {
                System.err.println(line + matcher.toString());
                return;
            }

            System.out.println("%a Remote IP Address     : " + matcher.group(1));}
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            buff.close();
            myFile.close();
        } catch (IOException e) {
            e.printStackTrace();
        }}}}`

結果我得到這個:

Using RE Pattern:
^([\d.]+|[\d:]+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) ([\d]+) [a-zA-Z0-9_ ]*(\S+) [-]?[ ]?\[([\w:/] +\s[+\-]\d{4})\] \"(.+?)\" (\d{3}) (\S+) ([\d]+) (\S+) "(.+?)\" "(.+?)\"
groups: 17
127.0.0.1 192.168.1.66 1050 1050 127.0.0.1 - GET 8080 ?action=edit&studentId=1 - [24/May/2016:19:33:52 +0300] "GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1" 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"  "http://127.0.0.1:8080/CRUDProject/StudentController.do"java.util.regex.Matcher[pattern=^([\d.]+|[\d:]+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) ([\d]+) [a-zA-Z0-9_ ]*(\S+) [-]?[ ]?\[([\w:/] +\s[+\-]\d{4})\] \"(.+?)\" (\d{3}) (\S+) ([\d]+) (\S+) "(.+?)\" "(.+?)\" region=0,427 lastmatch=]`

所有幫助都取決於我在做什么以及做錯了什么,應該予以解決,這樣我才能得到應有的結果。 謝謝

您的模式與日志條目不匹配。 使用http://regexr.com/之類的工具來調試正則表達式。

此修改后的模式與您的樣本輸入匹配:

^([\d.]+|[\d:]+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) ([\d]+) [a-zA-Z0-9_ ]*(\S+) [-]?[ ]?\[([\w:/]+ [+\-]\d{4})\] \"(.+?)\" (\d{3}) (\S+) ([\d]+) (\S+) "(.+?)\"  "(.+?)\"

那可能無法解決您的所有問題,但看起來仍然很脆弱。 進行更多測試並調整模式。

描述

此正則表達式將執行以下操作:

  • 匹配日志消息中的所有子字符串
  • 將每個匹配的子字符串放在其自己的捕獲組中

注意:要在Java中使用此正則表達式,您需要將所有\\替換為\\\\ 我還將與每個子字符串匹配的表達式留在了自己的行上。 如果您以這種格式使用此表達式,則需要包括“忽略空白”標志,或僅使表達式成為一行。 請記住,此表達式不會對日期或IP地址子字符串進行詳盡的驗證。

^
((?:[0-9]{1,3}\.){3}[0-9]{1,3})\s+
((?:[0-9]{1,3}\.){3}[0-9]{1,3})\s+
([0-9]+)\s+
([0-9]+)\s+
((?:[0-9]{1,3}\.){3}[0-9]{1,3})\s+
-\s+
([a-z]+\s[0-9]+)\s+
(\?[^\s]+)\s+
-\s+
\[([0-9]{1,2}\/(?:Jan|feb|Mar|apr|may|Jun|July|Aug|Sep|Oct|Nov|Dec)\/[0-9]{4}(?::[0-9]{2}){3}\s+\+[0-9]{4})\]\s+
"([^"]+)"\s+
([0-9]+)\s+
([^\s]+)\s+
([0-9]+)\s+
([0-9a-f]+)\s+
"([^"]+)"\s+
"([^"]+)"

正則表達式可視化

要更好地查看圖像,可以右鍵單擊圖像,然后選擇在新窗口中打開。

現場演示

https://regex101.com/r/mX7gG2/1

示范文本

127.0.0.1 192.168.1.1 1050 1050 127.0.0.1-GET 8080?action = edit&studentId = 1-[24 / May / 2016:19:33:52 +0300]“ GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP / 1.1“ 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226” Mozilla / 5.0(Windows NT 6.1; WOW64)AppleWebKit / 537.36(KHTML,如Gecko)Chrome / 50.0.2661.102 Safari / 537.36“” http://127.0.0.1 :8080 / CRUDProject / StudentController.do

比賽樣本

[0][0] = 127.0.0.1 192.168.1.1 1050 1050 127.0.0.1 - GET 8080 ?action=edit&studentId=1 - [24/May/2016:19:33:52 +0300] "GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1" 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"  "http://127.0.0.1:8080/CRUDProject/StudentController.do"
[0][1] = 127.0.0.1
[0][2] = 192.168.1.1
[0][3] = 1050
[0][4] = 1050
[0][5] = 127.0.0.1
[0][6] = GET 8080
[0][7] = ?action=edit&studentId=1
[0][8] = 24/May/2016:19:33:52 +0300
[0][9] = GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1
[0][10] = 200
[0][11] = /CRUDProject/StudentController.do
[0][12] = 264
[0][13] = ABADDD8AFB03ECC4791D76E543290226
[0][14] = Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
[0][15] = http://127.0.0.1:8080/CRUDProject/StudentController.do

說明

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of a "line"
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    (?:                      group, but do not capture (3 times):
----------------------------------------------------------------------
      [0-9]{1,3}               any character of: '0' to '9' (between
                               1 and 3 times (matching the most
                               amount possible))
----------------------------------------------------------------------
      \.                       '.'
----------------------------------------------------------------------
    ){3}                     end of grouping
----------------------------------------------------------------------
    [0-9]{1,3}               any character of: '0' to '9' (between 1
                             and 3 times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    (?:                      group, but do not capture (3 times):
----------------------------------------------------------------------
      [0-9]{1,3}               any character of: '0' to '9' (between
                               1 and 3 times (matching the most
                               amount possible))
----------------------------------------------------------------------
      \.                       '.'
----------------------------------------------------------------------
    ){3}                     end of grouping
----------------------------------------------------------------------
    [0-9]{1,3}               any character of: '0' to '9' (between 1
                             and 3 times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \4:
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \4
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \5:
----------------------------------------------------------------------
    (?:                      group, but do not capture (3 times):
----------------------------------------------------------------------
      [0-9]{1,3}               any character of: '0' to '9' (between
                               1 and 3 times (matching the most
                               amount possible))
----------------------------------------------------------------------
      \.                       '.'
----------------------------------------------------------------------
    ){3}                     end of grouping
----------------------------------------------------------------------
    [0-9]{1,3}               any character of: '0' to '9' (between 1
                             and 3 times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \5
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  -                        '-'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \6:
----------------------------------------------------------------------
    [a-z]+                   any character of: 'a' to 'z' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \6
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \7:
----------------------------------------------------------------------
    \?                       '?'
----------------------------------------------------------------------
    [^\s]+                   any character except: whitespace (\n,
                             \r, \t, \f, and " ") (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \7
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  -                        '-'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  \[                       '['
----------------------------------------------------------------------
  (                        group and capture to \8:
----------------------------------------------------------------------
    [0-9]{1,2}               any character of: '0' to '9' (between 1
                             and 2 times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \/                       '/'
----------------------------------------------------------------------
    (?:                      group, but do not capture:
----------------------------------------------------------------------
      Jan                      'Jan'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      feb                      'feb'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Mar                      'Mar'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      apr                      'apr'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      may                      'may'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Jun                      'Jun'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      July                     'July'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Aug                      'Aug'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Sep                      'Sep'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Oct                      'Oct'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Nov                      'Nov'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Dec                      'Dec'
----------------------------------------------------------------------
    )                        end of grouping
----------------------------------------------------------------------
    \/                       '/'
----------------------------------------------------------------------
    [0-9]{4}                 any character of: '0' to '9' (4 times)
----------------------------------------------------------------------
    (?:                      group, but do not capture (3 times):
----------------------------------------------------------------------
      :                        ':'
----------------------------------------------------------------------
      [0-9]{2}                 any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
    ){3}                     end of grouping
----------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \+                       '+'
----------------------------------------------------------------------
    [0-9]{4}                 any character of: '0' to '9' (4 times)
----------------------------------------------------------------------
  )                        end of \8
----------------------------------------------------------------------
  \]                       ']'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  (                        group and capture to \9:
----------------------------------------------------------------------
    [^"]+                    any character except: '"' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \9
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \10:
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \10
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \11:
----------------------------------------------------------------------
    [^\s]+                   any character except: whitespace (\n,
                             \r, \t, \f, and " ") (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \11
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \12:
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \12
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \13:
----------------------------------------------------------------------
    [0-9a-f]+                any character of: '0' to '9', 'a' to 'f'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
  )                        end of \13
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  (                        group and capture to \14:
----------------------------------------------------------------------
    [^"]+                    any character except: '"' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \14
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  (                        group and capture to \15:
----------------------------------------------------------------------
    [^"]+                    any character except: '"' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \15
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM