[英]Show just specific group of regexp and remove rest of the line in bash with sed
I have an access log with many lines in the following format:我有一个多行的访问日志,格式如下:
1.2.3.4:443 - - [11/Mar/2020:09:41:05 +0100] RESPONSE_CODE:[200] AGE: [-] CACHE_MISS: [-] CACHE-STATUS: [-] SIZE: [1288] RESPONSE_TIME: [2/2125012] (microseconds) WAS:[was.internal:9444] "PUT /kudosboards/node/a8740540-801a-43a6-822a-d58a2424fd3f HTTP/1.1" 200 REFERER: "https://ihs.internal/kudosboards/"
I just want to get the response time, so in this example 2/2125012
.我只想得到响应时间,所以在这个例子中
2/2125012
。 My idea was to write a regex pattern, that matches the brackets content in a group, and everything after/before it in other groups.我的想法是编写一个正则表达式模式,匹配组中的括号内容,以及其他组中它之后/之前的所有内容。 So I could replace the entire line by just this value:
所以我可以用这个值替换整行:
^(.*)RESPONSE_TIME: \[([^\]]+)(.*)$
Using 101regex with an example input string, it gavae me `` as second group as expected:将101regex与示例输入字符串一起使用,它按预期将我 `` 作为第二组:
Group 2 2/2125012
To use this pattern with egrep
, I escaped the brackets like this:为了将这种模式与
egrep
,我像这样对括号进行了转义:
$ sed 's#^\(.*\)RESPONSE_TIME: \[\([\^\]]+\)\(.*\)$#\2#g' testfile
1.2.3.4:443 - - [11/Mar/2020:09:41:05 +0100] RESPONSE_CODE:[200] AGE: [-] CACHE_MISS: [-] CACHE-STATUS: [-] SIZE: [1288] RESPONSE_TIME: [2/2125012] (microseconds) WAS:[was.internal:9444] "PUT /kudosboards/node/a8740540-801a-43a6-822a-d58a2424fd3f HTTP/1.1" 200 REFERER: "https://ihs.internal/kudosboards/"
Why is nothing replaced?为什么什么都没有更换? I escaped
(
and [
.我逃脱了
(
和[
.
It seems that this has something to do with the square brackets:这似乎与方括号有关:
$ sed 's#^\(.*\)RESPONSE_TIME: \[\(.*\)\] (micro\(.*\)$#\2#g' testfile
2/2125012
This worked.这奏效了。 But those pattern is not very specific.
但是那些模式不是很具体。 I'd like make it more specific by having eg
[0-9]+/[0-9]+
for the pattern inside the brackets instead of (.*)
wildcard pattern.我想让它更具体,例如在括号内使用
[0-9]+/[0-9]+
代替(.*)
通配符模式。
Your pattern contains an issue related to the use of POSIX BRE/ERE: [\\^\\]]+
matches a char that is either ^
or ]
and then a +
char ( demo ).您的模式包含与使用 POSIX BRE/ERE 相关的问题:
[\\^\\]]+
匹配一个字符,即^
或]
,然后是一个+
字符( demo )。 You need to use *
(that matches 0 or more occurrences ) instead of +
, or \\+
in GNU sed
, or \\{1,\\}
in a generic POSIX BRE.您需要在 GNU
sed
使用*
(匹配0 次或多次出现)而不是+
或\\+
,或在通用 POSIX BRE 中使用\\{1,\\}
。
You may fix the sed
command by using您可以使用以下命令修复
sed
命令
sed -n 's#.*RESPONSE_TIME: \[\([^]]*\).*#\1#p' testfile
See the online sed
demo .请参阅在线
sed
演示。
Details细节
-n
-suppresses the default line output -n
抑制默认行输出.*RESPONSE_TIME: \\[\\([^]]*\\).*
- matches any 0+ chars, RESPONSE_TIME:
, space, [
, then captures into Group 1 any zero or more chars other than ]
, and then matches the rest of the string .*RESPONSE_TIME: \\[\\([^]]*\\).*
- 匹配任何 0+ 个字符、 RESPONSE_TIME:
、空格、 [
,然后将除]
以外的任何零个或多个字符捕获到组 1 中,然后匹配其余字符字符串的\\1
- replaces the match with the Group 1 value \\1
- 用组 1 值替换匹配p
- prints the result of the substitution. p
- 打印替换的结果。$ awk -F'[][]' '{print $14}' file
2/2125012
If that's not all you need then edit your question to provide more truly representative sample input/output including cases that the above doesn't work for.如果这不是您所需要的全部,那么编辑您的问题以提供更真实具有代表性的样本输入/输出,包括上述不适用的情况。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.