繁体   English   中英

消除不匹配多行模式的行

[英]Eliminate lines not matching multiline pattern

我正在搜索日志文件,试图确定用户登录的总时间。 我已经消除了与登录和注销无关的所有行。 但是,由于某些原因,我们的登录行没有相应的注销行,因此我想消除它们。 例如:

2013-04-07 08:44:01 [INFO] User logged in
2013-04-07 08:54:55 [INFO] User logged in
2013-04-07 08:57:12 [INFO] User logged in
2013-04-07 08:59:45 [INFO] User logged in
2013-04-07 09:01:28 [INFO] User logged in
2013-04-07 09:11:00 [INFO] User logged in
2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection

我只想

2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection

这个awk单行代码可以解决问题:(至少在您的示例中。我看不到真实文件)

awk -F\[ '{a[$2]=$0;}END{for(x in a)print a[x]}' file

测试您的数据:

kent$  echo "2013-04-07 08:44:01 [INFO] User logged in
2013-04-07 08:54:55 [INFO] User logged in
2013-04-07 08:57:12 [INFO] User logged in
2013-04-07 08:59:45 [INFO] User logged in
2013-04-07 09:01:28 [INFO] User logged in
2013-04-07 09:11:00 [INFO] User logged in
2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection"|awk -F\[ '{a[$2]=$0;}END{for(x in a)print a[x]}'                                                                           
2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection

对于相同的登录名,只会打印出最后一个。

编辑

我认为您的真实文件可能在这种情况下:

您可能有多个登录丢失的连接块,例如:

kent$  cat file
2013-04-07 09:11:00 [INFO] User logged in
2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection
2013-04-08 09:11:00 [INFO] User logged in
2013-04-08 09:12:56 [INFO] User logged in
2013-04-08 09:15:43 [INFO] User lost connection

然后此行为您工作:

 awk '/lost/{print a;print;next;}{a=$0}' file

输出为:

2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection
2013-04-08 09:12:56 [INFO] User logged in
2013-04-08 09:15:43 [INFO] User lost connection

假设连续不会有多条User lost connection线,则应执行以下操作:

sed '/User logged in/{h;d};H;x' file

或者,如果您使用的系统不支持; 作为命令分隔符:

sed -e '/User logged in/{h
d
}' -e 'H' -e 'x' file

我可以展示awk解决方案。 如果某行包含“已登录”字符串,请保存该行。 如果该行不包含“已登录”字符串,则打印最后存储的行并打印当前行。 如果可能有两条“丢失的连接”线紧挨着,这可能是一个问题。 Awk也是过滤掉其他行的好选择。

#!/bin/bash

awk '!/logged in/ {print x"\n"$0} {x = $0}' <<EOT
2013-04-07 08:44:01 [INFO] User logged in
2013-04-07 08:54:55 [INFO] User logged in
2013-04-07 08:57:12 [INFO] User logged in
2013-04-07 08:59:45 [INFO] User logged in
2013-04-07 09:01:28 [INFO] User logged in
2013-04-07 09:11:00 [INFO] User logged in
2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection
EOT

这可能对您有用(GNU sed):

sed -r '$!N;/(User logged in)\n.*\1/D' file

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM