[英]Need to create regex for analyzing rails server log
我有一個Rails服務器日志文件,其格式如下。
Started <REQUEST_TYPE_1> <URL_1> for <IP_1> at <TIMESTAMP_1>
Processing by <controller#action_1> as <REQUEST_FORMAT_1>
Parameters: <parameters_1>
<Some logs from code>
Rendered <some_template_1> (<timetaken_1>)
Completed <RESPONSE_CODE_1> in <TIME_1>
Started <REQUEST_REQUEST_TYPE_2> <URL_2> for <IP_2> at <TIMESTAMP_2>
Processing by <controller#action_2> as <REQUEST_FORMAT_2>
Parameters: <parameters_2>
<Some logs from code>
Completed <RESPONSE_CODE_2> in <TIME_2>
現在,我需要解析此日志並從上面的日志中提取所有REQUEST_TYPE
, URL
, IP
, TIMESTAMP
, REQUEST_FORMAT
, RESPONSE_CODE
。 我正在努力在java / ruby中為其創建一個良好的正則表達式。 實際輸入中不存在<>
。 我添加了可讀性和掩蓋實際數據。
請求示例:
Started GET "/google.com/2" for 127.0.0.1 at Tue Dec 01 12:01:13 +0530 2015
Processing by MyController#method as JS
Parameters: {"abc" => "xyz"}
[LOG] 3 : User text log
Completed 200 OK in 26ms (Views: 3.3ms | ActiveRecord: 2.9ms)
Started POST "/google.com/543" for 127.0.1.1 at Tue Dec 01 13:13:16 +0530 2015
Processing by MyController#method_2 as JSON
Parameters: {"efg" => "uvw"}
Completed 404 Not Authorized in 65ms (Views: 1.5ms | ActiveRecord: 1.0ms)
預期產量:
request_types = ['GET', 'POST']
urls = ['/google.com/2','/google.com/543']
ips = ['127.0.0.1','127.0.1.1']
timestamps = ['Tue Dec 01 12:01:13 +0530 2015','Tue Dec 01 13:13:16 +0530 2015']
request_formats = ['JS','JSON']
response_codes = ['200 OK','404 Not Authorized']
我能夠編寫以下正則表達式,但無法正常工作。
request_types = /Started \w+/ //Expected array of all request types
urls = /"\/.*\/"/ //Expected array of all urls types
ips = /"d{1,3}.d{1,3}.d{1,3}.d{1,3}"/ //Expected array of all ips types
timestamps = /at \w+/
request_formats =/as \w+/
response_codes = /Completed \w+/
我希望在創建用於從JAVA / RUBY中的給定輸入中提取此參數的正則表達式時獲得一些幫助。 如果可能,我希望使用Java。
這是一個Java代碼段,顯示了如何將日志中的詳細信息獲取到Java中單獨的數組列表中:
String re = "(?sm)^Started\\s+(?<requesttype>\\S+)\\s+\"(?<url>\\S+)\"\\s+for\\s+(?<ip>\\d+(?:\\.\\d+)+)\\s+at\\s+(?<tsp>[a-zA-Z]+\\s+[a-zA-Z]+\\s+\\d+\\s+\\d+:\\d+:\\d+\\s+\\+\\d+\\s\\d{4})\\s+(?:Processing\\s+by\\s+\\S+)\\s+as\\s+(?<requestformat>\\S+)(?:\\s+Parameters:\\s+\\S+)?(?:(?:(?:(?!\nStarted ).)*Completed\\s)(?<responsecode>\\d+(?:(?!\\sin\\s).)*))?";
String str = "Started GET \"/google.com/2\" for 127.0.0.1 at Tue Dec 01 12:01:13 +0530 2015\n Processing by MyController#method as JS\n Parameters: {\"abc\" => \"xyz\"}\n[LOG] 3 : User text log\nCompleted 200 OK in 26ms (Views: 3.3ms | ActiveRecord: 2.9ms)\n\n\nStarted POST \"/google.com/543\" for 127.0.1.1 at Tue Dec 01 13:13:16 +0530 2015\n Processing by MyController#method_2 as JSON\n Parameters: {\"efg\" => \"uvw\"}\nCompleted 404 Not Authorized in 65ms (Views: 1.5ms | ActiveRecord: 1.0ms)";
Pattern pattern = Pattern.compile(re);
Matcher matcher = pattern.matcher(str);
List<String> requesttypes = new ArrayList<String>();
List<String> urls = new ArrayList<String>();
List<String> ips = new ArrayList<String>();
List<String> timestamps = new ArrayList<String>();
List<String> requestformats = new ArrayList<String>();
List<String> responsecodes = new ArrayList<String>();
while (matcher.find()){
requesttypes.add(matcher.group("requesttype"));
urls.add(matcher.group("url"));
ips.add(matcher.group("ip"));
timestamps.add(matcher.group("tsp"));
requestformats.add(matcher.group("requestformat"));
responsecodes.add(matcher.group("responsecode"));
System.out.println("-----------------------");
System.out.println(matcher.group("requesttype"));
System.out.println(matcher.group("url"));
System.out.println(matcher.group("ip"));
System.out.println(matcher.group("tsp"));
System.out.println(matcher.group("requestformat"));
System.out.println(matcher.group("responsecode"));
}
請參閱IDEONE演示 。 完成匹配后,甚至可以打印陣列,例如System.out.println(urls)
:
System.out.println(requesttypes);
System.out.println(urls);
System.out.println(ips);
System.out.println(urls);
System.out.println(timestamps);
System.out.println(requestformats);
System.out.println(responsecodes);
看到這個演示 。 輸出為:
[GET, POST]
[/google.com/2, /google.com/543]
[127.0.0.1, 127.0.1.1]
[/google.com/2, /google.com/543]
[Tue Dec 01 12:01:13 +0530 2015, Tue Dec 01 13:13:16 +0530 2015]
[JS, JSON]
[200 OK, 404 Not Authorized]
正則表達式匹配:
(?sm)^
-行的開頭(由於^
和?m
選項) Started\\\\s+
-文字Started
字符串和1+空格 (?<requesttype>\\\\S+)
-組“請求類型”,保存1個以上非空格字符 \\\\s+\\"
-1+空格后跟"
(?<url>\\\\S+)
-組“ url”,包含1個以上非空格 \\"\\\\s+for\\\\s+
- "
后跟1+空格+ for
+ 1+空格 (?<ip>\\\\d+(?:\\\\.\\\\d+)+)
-包含數字+的IP組.
+位數字( .
+位數字1+次) \\\\s+at\\\\s+
字- at
與周圍的空白 (?<tsp>[a-zA-Z]+\\\\s+[a-zA-Z]+\\\\s+\\\\d+\\\\s+\\\\d+:\\\\d+:\\\\d+\\\\s+\\\\+\\\\d+\\\\s\\\\d{4})
-時間戳組以不同的順序保存字母和數字,並用空格acc分隔。 輸入示例
\\\\s+
-1+空格 (?:Processing\\\\s+by\\\\s+\\\\S+)\\\\s+as\\\\s+
- Processing by
后跟一個單詞(1+個非空白),后跟一個單詞as
周圍用空格包圍 (?<requestformat>\\\\S+)
-由非空格符號組成的組“請求格式” (?:\\\\s+Parameters:\\\\s+\\\\S+)?
-可選的Paramters:
組Paramters:
后跟空白和一些單詞 (?:(?:(?:(?!\\nStarted ).)*Completed\\\\s)(?<responsecode>\\\\d+(?:(?!\\\\sin\\\\s).)*))?
-可選組(由於包含在(?:...)?
),匹配直到Completed
任何字符,但沒有Started
(由於調和的貪婪令牌(?:(?!\\nStarted ).)*
)) ,然后匹配Completed
后跟一個空格,然后(?<responsecode>\\\\d+(?:(?!\\\\sin\\\\s).)*)
匹配並捕獲到組“ response code”中,后跟任意數字字符,直到整個單詞in
用空格包圍。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.