简体   繁体   中英

Ruby Regex for Common Log Format

Hey guys I'm looking for a regular expression which will 'parse' a line of the Common Log Format standard and will give me the 7 variables from it:

  • IP
  • identity
  • username
  • time
  • request
  • status
  • size of the object.

Has anybody already implemented this regex?

Input:

127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

Regex:

(\S+)\s+(\S+)\s+(\S+)\s+(\[.*?\])\s+(".*?")\s+(\S+)\s+(\S+)

Where the capture groups are numbered as in the breakdown below.

Breakdown:

Group         Regex         Match
#1 IP         (\S+)         127.0.0.1
#2 Identity   (\S+)         user-identifier
#3 Username   (\S+)         frank
#4 Time       (\[.*?\])     [10/Oct/2000:13:55:36 -0700]
#5 Request    (".*?")       "GET /apache_pb.gif HTTP/1.0" 
#6 Status     (\S+)         200
#7 Size       (\S+)         2326
each separated by a \s+

I would just get the time and request first, then it is just a simple split:

a = '127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326'

time    = a.slice!(/\[.*?\]/)
request = a.slice!(/".*"/)
ip, identity, username, status, size = a.split

我还想出了自己的正则表达式,它也拆分了动词,uri和HTTP版本。

^([\d\.]*)\s([\w|-]*)\s([\w|-]*)\s\[(.*)\]\s\"([\w]*)\s(.*)\s(.*)\"\s([\d]*)\s([\d]*)$

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM