[英]Regexp pattern matching IP and UserAgent in an Huge File
I have a huge log file that has a structure like this: 我有一个巨大的日志文件,其结构如下:
ip=X.X.X.X
userAgent=Firefox
-----
Referer=hxxp://www.bla.org
I want to create a custom output like this: ip:userAgent 我想创建这样的自定义输出:ip:userAgent
for ex: 例如:
X.X.X.X:Firefox
and the pattern will ignore lines which don't start with ip= and userAgent=. 并且该模式将忽略不以ip =和userAgent =开头的行。 (these two must form a pair as i mentioned above.)
(如上所述,这两个必须成对。)
I am a newbie administrator and our client needs a sorted file immediately. 我是新手管理员,我们的客户需要立即整理文件。 Any help will be wonderful.
任何帮助都会很棒。 Thanks.
谢谢。
^ip=(\d+(?:\.\d+){3})[\r\n]+userAgent=(.+)$
Apply in global + multiline mode. 在全局+多行模式下应用。
Group 1 will contain the IP, group 2 will contain the user agent string. 组1将包含IP,组2将包含用户代理字符串。
Edit: The above expression can be simplified a bit, we can remove the IP address format checking - assuming that there will be nothing but real IP addresses in the log file: 编辑:上面的表达式可以简化一点,我们可以删除IP地址格式检查-假设日志文件中除了真实IP地址外什么都没有:
^ip=(\d+\.?)+[\r\n]+userAgent=(.+)$
You can use: 您可以使用:
^ip=((?:[0-9]{1,3}\.){3}[0-9]{1,3})$
And 和
^userAgent=(.*)$
Get the group 1 for both and you will have the desired data. 获取两个的组1,您将获得所需的数据。
give it a try (this is in no way robust if there are lines where your log file differs from the example snippet above): 尝试一下(如果您的日志文件中的行与上面的示例代码段不同,这绝对不会健壮):
sed -n -e '/^ip=/ {s///
N
s/\nuserAgent=/:/
p
}' HugeFile > customoutput
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.