[英]Ruby Regex for Common Log Format
Hey guys I'm looking for a regular expression which will 'parse' a line of the Common Log Format standard and will give me the 7 variables from it: 大家好,我正在寻找一个正则表达式,它将“解析” Common Log Format标准的一行并从中提供7个变量:
Has anybody already implemented this regex? 有人已经实现了此正则表达式吗?
Input: 输入:
127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
Regex: 正则表达式:
(\S+)\s+(\S+)\s+(\S+)\s+(\[.*?\])\s+(".*?")\s+(\S+)\s+(\S+)
Where the capture groups are numbered as in the breakdown below. 捕获组的编号如以下细分所示。
Breakdown: 分解:
Group Regex Match
#1 IP (\S+) 127.0.0.1
#2 Identity (\S+) user-identifier
#3 Username (\S+) frank
#4 Time (\[.*?\]) [10/Oct/2000:13:55:36 -0700]
#5 Request (".*?") "GET /apache_pb.gif HTTP/1.0"
#6 Status (\S+) 200
#7 Size (\S+) 2326
each separated by a \s+
I would just get the time and request first, then it is just a simple split: 我只是先获取时间并提出要求,然后这只是一个简单的拆分:
a = '127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326'
time = a.slice!(/\[.*?\]/)
request = a.slice!(/".*"/)
ip, identity, username, status, size = a.split
我还想出了自己的正则表达式,它也拆分了动词,uri和HTTP版本。
^([\d\.]*)\s([\w|-]*)\s([\w|-]*)\s\[(.*)\]\s\"([\w]*)\s(.*)\s(.*)\"\s([\d]*)\s([\d]*)$
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.