通用格式的Ruby Regex

Question

Hey guys I'm looking for a regular expression which will 'parse' a line of the Common Log Format standard and will give me the 7 variables from it: 大家好，我正在寻找一个正则表达式，它将“解析” Common Log Format标准的一行并从中提供7个变量：

IP IP
identity 身分
username 用户名
time 时间
request 请求
status 状态
size of the object. 对象的大小。

Has anybody already implemented this regex? 有人已经实现了此正则表达式吗？

Answer 1

Input: 输入：

127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

Regex: 正则表达式：

(\S+)\s+(\S+)\s+(\S+)\s+(\[.*?\])\s+(".*?")\s+(\S+)\s+(\S+)

Where the capture groups are numbered as in the breakdown below. 捕获组的编号如以下细分所示。

Breakdown: 分解：

Group         Regex         Match
#1 IP         (\S+)         127.0.0.1
#2 Identity   (\S+)         user-identifier
#3 Username   (\S+)         frank
#4 Time       (\[.*?\])     [10/Oct/2000:13:55:36 -0700]
#5 Request    (".*?")       "GET /apache_pb.gif HTTP/1.0" 
#6 Status     (\S+)         200
#7 Size       (\S+)         2326
each separated by a \s+

Answer 2

I would just get the time and request first, then it is just a simple split: 我只是先获取时间并提出要求，然后这只是一个简单的拆分：

a = '127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326'

time    = a.slice!(/\[.*?\]/)
request = a.slice!(/".*"/)
ip, identity, username, status, size = a.split

Answer 3

我还想出了自己的正则表达式，它也拆分了动词，uri和HTTP版本。

^([\d\.]*)\s([\w|-]*)\s([\w|-]*)\s\[(.*)\]\s\"([\w]*)\s(.*)\s(.*)\"\s([\d]*)\s([\d]*)$

通用格式的Ruby Regex

问题描述

3 个解决方案

解决方案1
1 2013-11-18 13:27:48

解决方案2
1 已采纳 2013-11-18 13:44:36

解决方案3
0 2013-11-18 14:17:20

通用格式的Ruby Regex

问题描述

3 个解决方案

解决方案1 1 2013-11-18 13:27:48

解决方案2 1 已采纳 2013-11-18 13:44:36

解决方案3 0 2013-11-18 14:17:20

解决方案1
1 2013-11-18 13:27:48

解决方案2
1 已采纳 2013-11-18 13:44:36

解决方案3
0 2013-11-18 14:17:20