簡體   English   中英

需要創建正則表達式來分析Rails服務器日志

[英]Need to create regex for analyzing rails server log

我有一個Rails服務器日志文件,其格式如下。

Started <REQUEST_TYPE_1> <URL_1> for <IP_1> at <TIMESTAMP_1>
  Processing by <controller#action_1> as <REQUEST_FORMAT_1>
  Parameters: <parameters_1>
<Some logs from code>
Rendered <some_template_1> (<timetaken_1>)
Completed <RESPONSE_CODE_1> in <TIME_1>


Started <REQUEST_REQUEST_TYPE_2> <URL_2> for <IP_2> at <TIMESTAMP_2>
  Processing by <controller#action_2> as <REQUEST_FORMAT_2>
  Parameters: <parameters_2>
<Some logs from code>
Completed <RESPONSE_CODE_2> in <TIME_2>

現在,我需要解析此日志並從上面的日志中提取所有REQUEST_TYPEURLIPTIMESTAMPREQUEST_FORMATRESPONSE_CODE 我正在努力在java / ruby​​中為其創建一個良好的正則表達式。 實際輸入中不存在<> 我添加了可讀性和掩蓋實際數據。

請求示例:

Started GET "/google.com/2" for 127.0.0.1 at Tue Dec 01 12:01:13 +0530 2015
  Processing by MyController#method as JS
  Parameters: {"abc" => "xyz"}
[LOG] 3 : User text log
Completed 200 OK in 26ms (Views: 3.3ms | ActiveRecord: 2.9ms)


Started POST "/google.com/543" for 127.0.1.1 at Tue Dec 01 13:13:16 +0530 2015
  Processing by MyController#method_2 as JSON
  Parameters: {"efg" => "uvw"}
Completed 404 Not Authorized in 65ms (Views: 1.5ms | ActiveRecord: 1.0ms)

預期產量:

request_types = ['GET', 'POST']
urls = ['/google.com/2','/google.com/543']
ips = ['127.0.0.1','127.0.1.1']
timestamps = ['Tue Dec 01 12:01:13 +0530 2015','Tue Dec 01 13:13:16 +0530 2015']
request_formats = ['JS','JSON']
response_codes = ['200 OK','404 Not Authorized']

我能夠編寫以下正則表達式,但無法正常工作。

request_types = /Started \w+/  //Expected array of all request types
urls = /"\/.*\/"/ //Expected array of all urls types
ips = /"d{1,3}.d{1,3}.d{1,3}.d{1,3}"/ //Expected array of all ips types
timestamps =  /at \w+/
request_formats =/as \w+/
response_codes = /Completed \w+/

我希望在創建用於從JAVA / RUBY中的給定輸入中提取此參數的正則表達式時獲得一些幫助。 如果可能,我希望使用Java。

這是一個Java代碼段,顯示了如何將日志中的詳細信息獲取到Java中單獨的數組列表中:

String re = "(?sm)^Started\\s+(?<requesttype>\\S+)\\s+\"(?<url>\\S+)\"\\s+for\\s+(?<ip>\\d+(?:\\.\\d+)+)\\s+at\\s+(?<tsp>[a-zA-Z]+\\s+[a-zA-Z]+\\s+\\d+\\s+\\d+:\\d+:\\d+\\s+\\+\\d+\\s\\d{4})\\s+(?:Processing\\s+by\\s+\\S+)\\s+as\\s+(?<requestformat>\\S+)(?:\\s+Parameters:\\s+\\S+)?(?:(?:(?:(?!\nStarted ).)*Completed\\s)(?<responsecode>\\d+(?:(?!\\sin\\s).)*))?";
String str = "Started GET \"/google.com/2\" for 127.0.0.1 at Tue Dec 01 12:01:13 +0530 2015\n  Processing by MyController#method as JS\n  Parameters: {\"abc\" => \"xyz\"}\n[LOG] 3 : User text log\nCompleted 200 OK in 26ms (Views: 3.3ms | ActiveRecord: 2.9ms)\n\n\nStarted POST \"/google.com/543\" for 127.0.1.1 at Tue Dec 01 13:13:16 +0530 2015\n  Processing by MyController#method_2 as JSON\n  Parameters: {\"efg\" => \"uvw\"}\nCompleted 404 Not Authorized in 65ms (Views: 1.5ms | ActiveRecord: 1.0ms)";
Pattern pattern = Pattern.compile(re);
Matcher matcher = pattern.matcher(str);
List<String> requesttypes = new ArrayList<String>();
List<String> urls = new ArrayList<String>();
List<String> ips = new ArrayList<String>();
List<String> timestamps = new ArrayList<String>(); 
List<String> requestformats = new ArrayList<String>(); 
List<String> responsecodes = new ArrayList<String>();
while (matcher.find()){
    requesttypes.add(matcher.group("requesttype"));
    urls.add(matcher.group("url"));
    ips.add(matcher.group("ip"));
    timestamps.add(matcher.group("tsp"));
    requestformats.add(matcher.group("requestformat"));
    responsecodes.add(matcher.group("responsecode"));
    System.out.println("-----------------------");
    System.out.println(matcher.group("requesttype"));
    System.out.println(matcher.group("url")); 
    System.out.println(matcher.group("ip")); 
    System.out.println(matcher.group("tsp")); 
    System.out.println(matcher.group("requestformat")); 
    System.out.println(matcher.group("responsecode")); 
} 

請參閱IDEONE演示 完成匹配后,甚至可以打印陣列,例如System.out.println(urls)

System.out.println(requesttypes);
System.out.println(urls);
System.out.println(ips);
System.out.println(urls);
System.out.println(timestamps);
System.out.println(requestformats);
System.out.println(responsecodes);

看到這個演示 輸出為:

[GET, POST]
[/google.com/2, /google.com/543]
[127.0.0.1, 127.0.1.1]
[/google.com/2, /google.com/543]
[Tue Dec 01 12:01:13 +0530 2015, Tue Dec 01 13:13:16 +0530 2015]
[JS, JSON]
[200 OK, 404 Not Authorized]

正則表達式匹配:

  • (?sm)^ -行的開頭(由於^?m選項)
  • Started\\\\s+ -文字Started字符串和1+空格
  • (?<requesttype>\\\\S+) -組“請求類型”,保存1個以上非空格字符
  • \\\\s+\\" -1+空格后跟"
  • (?<url>\\\\S+) -組“ url”,包含1個以上非空格
  • \\"\\\\s+for\\\\s+ - "后跟1+空格+ for + 1+空格
  • (?<ip>\\\\d+(?:\\\\.\\\\d+)+) -包含數字+的IP組. +位數字( . +位數字1+次)
  • \\\\s+at\\\\s+字- at與周圍的空白
  • (?<tsp>[a-zA-Z]+\\\\s+[a-zA-Z]+\\\\s+\\\\d+\\\\s+\\\\d+:\\\\d+:\\\\d+\\\\s+\\\\+\\\\d+\\\\s\\\\d{4}) -時間戳組以不同的順序保存字母和數字,並用空格acc分隔。 輸入示例
    • \\\\s+ -1+空格
  • (?:Processing\\\\s+by\\\\s+\\\\S+)\\\\s+as\\\\s+ - Processing by后跟一個單詞(1+個非空白),后跟一個單詞as周圍用空格包圍
  • (?<requestformat>\\\\S+) -由非空格符號組成的組“請求格式”
  • (?:\\\\s+Parameters:\\\\s+\\\\S+)? -可選的Paramters:Paramters:后跟空白和一些單詞
  • (?:(?:(?:(?!\\nStarted ).)*Completed\\\\s)(?<responsecode>\\\\d+(?:(?!\\\\sin\\\\s).)*))? -可選組(由於包含在(?:...)? ),匹配直到Completed任何字符,但沒有Started (由於調和的貪婪令牌(?:(?!\\nStarted ).)* )) ,然后匹配Completed后跟一個空格,然后(?<responsecode>\\\\d+(?:(?!\\\\sin\\\\s).)*)匹配並捕獲到組“ response code”中,后跟任意數字字符,直到整個單詞in用空格包圍。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM