[英]Can someone compute a regular expression for apache access log files for Scala?
[英]Regular Expression for parsing apache error log files
我需要在Java程序中使用正則表達式來解析apache錯誤文件,例如:
[Thu Sep 27 12:08:18 2012] [error] [client 151.10.158.10] File does not exist: /srv/www/htdocs/pad/favicon.ico
[Thu Oct 04 17:02:42 2012] [error] [client 151.10.1.10] File does not exist: > /srv/www/htdocs/pad/favicon.ico
[Wed Oct 17 10:16:40 2012] [error] [client 151.10.14.60] File does not exist: /srv/www/htdocs/pad/sites/all/modules/fckeditor/fckeditor/editor/userfiles, referer: http://pad.sta.uniroma1.it/sites/all/modules/fckeditor/fckeditor/editor/fckeditor.html?InstanceName=edit-body&Toolbar=DrupalFull
我已經嘗試了幾種解決方案(其中一些以前已經在stackoverflow上進行了報道),一種似乎效果更好的解決方案是:
^(\[[\w:\s]+\]) (\[[\w]+\]) (\[[\w\d.\s]+\])?([\w\s/.(")-]+[\-:]) ([\w/\s]+)$
但是,它似乎無法匹配以下字符串:
[Thu May 17 22:41:54 2012] [error] [client 118.238.211.206] Invalid URI in request GET :81/phpmyadmin/scripts/setup.php HTTP/1.1
我該如何解決?
編輯我檢查了所有建議的解決方案,盡管增加了匹配行的數量,但它們仍然無法處理以下情況:
[Fri Jul 15 00:24:41 2011] [error] [client 219.12.35.141] script '/srv/www/htdocs/pad2/scripts/setup.php' not found or unable to stat
[Mon May 28 18:43:25 2012] [error] [client 88.110.28.25] Invalid URI in request GET HTTP/1.1 HTTP/1.1
還請注意,我可以在一個組中接收方括號后的所有數據(包括client關鍵字)
接收前三組中編碼的信息
查找[...]
最長的字符串,以[
開頭,以]
結尾,中間沒有其他]
符號- \\[[^\\]]+\\]
其余行捕獲為.*
-從當前位置匹配到行尾。
因此,您的完整解決方案如下所示:
^(\[[^\]]+\]) (\[[^\]]+\]) (\[[^\]]+\]) (.*)$
下面的正則表達式將匹配所有上述錯誤格式。
^(\[[\w:\s]+\]) (\[[\w]+\]) (\[[\w\d.\s]+\])?([\w\s\/.(")-]+[\-:])\s*>?\s*([\w\/\s.]+)(?:\s*,(\s*\w+:)\s*([\w\/.=?:&-]+))?$
“ GET:81”中的列后面沒有空格
這個作品:
^(\[[\w:\s]+\]) (\[[\w]+\]) (\[[\w\d.\s]+\])?([\w\s\/.(")-]+[\-:])\s?([\w\/\s.]+)
正則表達式的最后一段似乎不正確。 這個簡化的正則表達式應該可以工作:
^(\[[\w:\s]+\]) (\[[\w]+\]) (\[[\w\d.\s]+\]) ([\s\w/.(")-]+[-:])(.+)$
$a="[Thu May 17 22:41:54 2012] [error] [client 118.238.211.206] Invalid URI in request GET :81/phpmyadmin/scripts/setup.php HTTP/1.1\n";
$a .="[Thu May 17 22:41:54 2012] [error] [client 118.238.211.206] Invalid URI in request GET :81/phpmyadmin/scripts/setup.php HTTP/1.1\n";
$a .="[Thu May 17 22:41:54 2012] [error] [client 118.238.211.206] Invalid URI in request GET :81/phpmyadmin/scripts/setup.php HTTP/1.1\n";
preg_match_all("/(\[.*\])\s+(\[.*\])\s+(\[.*\])\s+([a-zA-Z0-9\s]+:)\s*(.*)/",$a,$m) ; var_dump($m);
試試這個...(輸出)
array (size=6)
0 =>
array (size=3)
0 => string '[Thu May 17 22:41:54 2012] [error] [client 118.238.211.206] Invalid URI in request GET :81/phpmyadmin/scripts/setup.php HTTP/1.1' (length=128)
1 => string '[Thu May 17 22:41:54 2012] [error] [client 118.238.211.206] Invalid URI in request GET :81/phpmyadmin/scripts/setup.php HTTP/1.1' (length=128)
2 => string '[Thu May 17 22:41:54 2012] [error] [client 118.238.211.206] Invalid URI in request GET : 81/phpmyadmin/scripts/setup.php HTTP/1.1' (length=129)
1 =>
array (size=3)
0 => string '[Thu May 17 22:41:54 2012]' (length=26)
1 => string '[Thu May 17 22:41:54 2012]' (length=26)
2 => string '[Thu May 17 22:41:54 2012]' (length=26)
2 =>
array (size=3)
0 => string '[error]' (length=7)
1 => string '[error]' (length=7)
2 => string '[error]' (length=7)
3 =>
array (size=3)
0 => string '[client 118.238.211.206]' (length=24)
1 => string '[client 118.238.211.206]' (length=24)
2 => string '[client 118.238.211.206]' (length=24)
4 =>
array (size=3)
0 => string 'Invalid URI in request GET :' (length=28)
1 => string 'Invalid URI in request GET :' (length=28)
2 => string 'Invalid URI in request GET :' (length=28)
5 =>
array (size=3)
0 => string '81/phpmyadmin/scripts/setup.php HTTP/1.1' (length=40)
1 => string '81/phpmyadmin/scripts/setup.php HTTP/1.1' (length=40)
2 => string '81/phpmyadmin/scripts/setup.php HTTP/1.1' (length=40)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.