简体   繁体   English

仅显示特定的正则表达式组并使用 sed 删除 bash 中的其余行

[英]Show just specific group of regexp and remove rest of the line in bash with sed

I have an access log with many lines in the following format:我有一个多行的访问日志,格式如下:

1.2.3.4:443  - - [11/Mar/2020:09:41:05 +0100] RESPONSE_CODE:[200] AGE: [-] CACHE_MISS: [-] CACHE-STATUS: [-] SIZE: [1288] RESPONSE_TIME: [2/2125012] (microseconds) WAS:[was.internal:9444] "PUT /kudosboards/node/a8740540-801a-43a6-822a-d58a2424fd3f HTTP/1.1" 200 REFERER: "https://ihs.internal/kudosboards/"

I just want to get the response time, so in this example 2/2125012 .我只想得到响应时间,所以在这个例子中2/2125012 My idea was to write a regex pattern, that matches the brackets content in a group, and everything after/before it in other groups.我的想法是编写一个正则表达式模式,匹配组中的括号内容,以及其他组中它之后/之前的所有内容。 So I could replace the entire line by just this value:所以我可以用这个值替换整行:

^(.*)RESPONSE_TIME: \[([^\]]+)(.*)$

Using 101regex with an example input string, it gavae me `` as second group as expected:101regex与示例输入字符串一起使用,它按预期将我 `` 作为第二组:

Group 2 2/2125012

To use this pattern with egrep , I escaped the brackets like this:为了将这种模式与egrep ,我像这样对括号进行了转义:

$ sed 's#^\(.*\)RESPONSE_TIME: \[\([\^\]]+\)\(.*\)$#\2#g' testfile
1.2.3.4:443  - - [11/Mar/2020:09:41:05 +0100] RESPONSE_CODE:[200] AGE: [-] CACHE_MISS: [-] CACHE-STATUS: [-] SIZE: [1288] RESPONSE_TIME: [2/2125012] (microseconds) WAS:[was.internal:9444] "PUT /kudosboards/node/a8740540-801a-43a6-822a-d58a2424fd3f HTTP/1.1" 200 REFERER: "https://ihs.internal/kudosboards/"

Why is nothing replaced?为什么什么都没有更换? I escaped ( and [ .我逃脱了([ .

It seems that this has something to do with the square brackets:这似乎与方括号有关:

$ sed 's#^\(.*\)RESPONSE_TIME: \[\(.*\)\] (micro\(.*\)$#\2#g' testfile
2/2125012

This worked.这奏效了。 But those pattern is not very specific.但是那些模式不是很具体。 I'd like make it more specific by having eg [0-9]+/[0-9]+ for the pattern inside the brackets instead of (.*) wildcard pattern.我想让它更具体,例如在括号内使用[0-9]+/[0-9]+代替(.*)通配符模式。

Your pattern contains an issue related to the use of POSIX BRE/ERE: [\\^\\]]+ matches a char that is either ^ or ] and then a + char ( demo ).您的模式包含与使用 POSIX BRE/ERE 相关的问题: [\\^\\]]+匹配一个字符,即^] ,然后是一个+字符( demo )。 You need to use * (that matches 0 or more occurrences ) instead of + , or \\+ in GNU sed , or \\{1,\\} in a generic POSIX BRE.您需要在 GNU sed使用* (匹配0 次或多次出现)而不是+\\+ ,或在通用 POSIX BRE 中使用\\{1,\\}

You may fix the sed command by using您可以使用以下命令修复sed命令

sed -n 's#.*RESPONSE_TIME: \[\([^]]*\).*#\1#p' testfile

See the online sed demo .请参阅在线sed演示

Details细节

  • -n -suppresses the default line output -n抑制默认行输出
  • .*RESPONSE_TIME: \\[\\([^]]*\\).* - matches any 0+ chars, RESPONSE_TIME: , space, [ , then captures into Group 1 any zero or more chars other than ] , and then matches the rest of the string .*RESPONSE_TIME: \\[\\([^]]*\\).* - 匹配任何 0+ 个字符、 RESPONSE_TIME: 、空格、 [ ,然后将除]以外的任何零个或多个字符捕获到组 1 中,然后匹配其余字符字符串的
  • \\1 - replaces the match with the Group 1 value \\1 - 用组 1 值替换匹配
  • p - prints the result of the substitution. p - 打印替换的结果。
$ awk -F'[][]' '{print $14}' file
2/2125012

If that's not all you need then edit your question to provide more truly representative sample input/output including cases that the above doesn't work for.如果这不是您所需要的全部,那么编辑您的问题以提供更真实具有代表性的样本输入/输出,包括上述不适用的情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM