正则表达式模式匹配巨大文件中的IP和UserAgent

Question

I have a huge log file that has a structure like this: 我有一个巨大的日志文件，其结构如下：

ip=X.X.X.X
userAgent=Firefox
-----
Referer=hxxp://www.bla.org

I want to create a custom output like this: ip:userAgent 我想创建这样的自定义输出：ip：userAgent

for ex: 例如：

X.X.X.X:Firefox

and the pattern will ignore lines which don't start with ip= and userAgent=. 并且该模式将忽略不以ip =和userAgent =开头的行。 (these two must form a pair as i mentioned above.) （如上所述，这两个必须成对。）

I am a newbie administrator and our client needs a sorted file immediately. 我是新手管理员，我们的客户需要立即整理文件。 Any help will be wonderful. 任何帮助都会很棒。 Thanks. 谢谢。

Answer 1

^ip=(\d+(?:\.\d+){3})[\r\n]+userAgent=(.+)$

Apply in global + multiline mode. 在全局+多行模式下应用。

Group 1 will contain the IP, group 2 will contain the user agent string. 组1将包含IP，组2将包含用户代理字符串。

Edit: The above expression can be simplified a bit, we can remove the IP address format checking - assuming that there will be nothing but real IP addresses in the log file: 编辑：上面的表达式可以简化一点，我们可以删除IP地址格式检查-假设日志文件中除了真实IP地址外什么都没有：

^ip=(\d+\.?)+[\r\n]+userAgent=(.+)$

Answer 2

You can use: 您可以使用：

^ip=((?:[0-9]{1,3}\.){3}[0-9]{1,3})$

And 和

^userAgent=(.*)$

Get the group 1 for both and you will have the desired data. 获取两个的组1，您将获得所需的数据。

Answer 3

give it a try (this is in no way robust if there are lines where your log file differs from the example snippet above): 尝试一下（如果您的日志文件中的行与上面的示例代码段不同，这绝对不会健壮）：

sed -n -e '/^ip=/ {s///
N
s/\nuserAgent=/:/
p 
}' HugeFile > customoutput

正则表达式模式匹配巨大文件中的IP和UserAgent

问题描述

3 个解决方案

解决方案1
3 2009-02-11 12:25:36

解决方案2
0 2009-02-11 12:22:38

解决方案3
0 2009-02-13 07:21:06

正则表达式模式匹配巨大文件中的IP和UserAgent

问题描述

3 个解决方案

解决方案1 3 2009-02-11 12:25:36

解决方案2 0 2009-02-11 12:22:38

解决方案3 0 2009-02-13 07:21:06

解决方案1
3 2009-02-11 12:25:36

解决方案2
0 2009-02-11 12:22:38

解决方案3
0 2009-02-13 07:21:06