仅显示特定的正则表达式组并使用 sed 删除 bash 中的其余行

Question

I have an access log with many lines in the following format:我有一个多行的访问日志，格式如下：

1.2.3.4:443  - - [11/Mar/2020:09:41:05 +0100] RESPONSE_CODE:[200] AGE: [-] CACHE_MISS: [-] CACHE-STATUS: [-] SIZE: [1288] RESPONSE_TIME: [2/2125012] (microseconds) WAS:[was.internal:9444] "PUT /kudosboards/node/a8740540-801a-43a6-822a-d58a2424fd3f HTTP/1.1" 200 REFERER: "https://ihs.internal/kudosboards/"

I just want to get the response time, so in this example 2/2125012 .我只想得到响应时间，所以在这个例子中2/2125012 。 My idea was to write a regex pattern, that matches the brackets content in a group, and everything after/before it in other groups.我的想法是编写一个正则表达式模式，匹配组中的括号内容，以及其他组中它之后/之前的所有内容。 So I could replace the entire line by just this value:所以我可以用这个值替换整行：

^(.*)RESPONSE_TIME: \[([^\]]+)(.*)$

Using 101regex with an example input string, it gavae me `` as second group as expected:将101regex与示例输入字符串一起使用，它按预期将我 `` 作为第二组：

Group 2 2/2125012

To use this pattern with egrep , I escaped the brackets like this:为了将这种模式与egrep ，我像这样对括号进行了转义：

$ sed 's#^\(.*\)RESPONSE_TIME: \[\([\^\]]+\)\(.*\)$#\2#g' testfile
1.2.3.4:443  - - [11/Mar/2020:09:41:05 +0100] RESPONSE_CODE:[200] AGE: [-] CACHE_MISS: [-] CACHE-STATUS: [-] SIZE: [1288] RESPONSE_TIME: [2/2125012] (microseconds) WAS:[was.internal:9444] "PUT /kudosboards/node/a8740540-801a-43a6-822a-d58a2424fd3f HTTP/1.1" 200 REFERER: "https://ihs.internal/kudosboards/"

Why is nothing replaced?为什么什么都没有更换？ I escaped ( and [ .我逃脱了(和[ .

It seems that this has something to do with the square brackets:这似乎与方括号有关：

$ sed 's#^\(.*\)RESPONSE_TIME: \[\(.*\)\] (micro\(.*\)$#\2#g' testfile
2/2125012

This worked.这奏效了。 But those pattern is not very specific.但是那些模式不是很具体。 I'd like make it more specific by having eg [0-9]+/[0-9]+ for the pattern inside the brackets instead of (.*) wildcard pattern.我想让它更具体，例如在括号内使用[0-9]+/[0-9]+代替(.*)通配符模式。

Answer 1

Your pattern contains an issue related to the use of POSIX BRE/ERE: [\\^\\]]+ matches a char that is either ^ or ] and then a + char ( demo ).您的模式包含与使用 POSIX BRE/ERE 相关的问题： [\\^\\]]+匹配一个字符，即^或] ，然后是一个+字符（ demo ）。 You need to use * (that matches 0 or more occurrences ) instead of + , or \\+ in GNU sed , or \\{1,\\} in a generic POSIX BRE.您需要在 GNU sed使用* （匹配0 次或多次出现）而不是+或\\+ ，或在通用 POSIX BRE 中使用\\{1,\\} 。

You may fix the sed command by using您可以使用以下命令修复sed命令

sed -n 's#.*RESPONSE_TIME: \[\([^]]*\).*#\1#p' testfile

See the online sed demo .请参阅在线sed演示。

Details细节

-n -suppresses the default line output -n抑制默认行输出
.*RESPONSE_TIME: \\[\\([^]]*\\).* - matches any 0+ chars, RESPONSE_TIME: , space, [ , then captures into Group 1 any zero or more chars other than ] , and then matches the rest of the string .*RESPONSE_TIME: \\[\\([^]]*\\).* - 匹配任何 0+ 个字符、 RESPONSE_TIME: 、空格、 [ ，然后将除]以外的任何零个或多个字符捕获到组 1 中，然后匹配其余字符字符串的
\\1 - replaces the match with the Group 1 value \\1 - 用组 1 值替换匹配
p - prints the result of the substitution. p - 打印替换的结果。

Answer 2

$ awk -F'[][]' '{print $14}' file
2/2125012

If that's not all you need then edit your question to provide more truly representative sample input/output including cases that the above doesn't work for.如果这不是您所需要的全部，那么编辑您的问题以提供更真实具有代表性的样本输入/输出，包括上述不适用的情况。

仅显示特定的正则表达式组并使用 sed 删除 bash 中的其余行

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-03-11 13:19:10

解决方案2
1 2020-03-11 14:02:51

仅显示特定的正则表达式组并使用 sed 删除 bash 中的其余行

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-03-11 13:19:10

解决方案2 1 2020-03-11 14:02:51

解决方案1
1 已采纳 2020-03-11 13:19:10

解决方案2
1 2020-03-11 14:02:51