简体   繁体   English

使用bash从文本中提取正则表达式组并输出到文件

[英]Extract regex groups from text with bash and output to file

I need to scan a log file and extract the relevant parts from it to another file. 我需要扫描一个日志文件,并将相关部分从其中提取到另一个文件中。 The log format is: 日志格式为:

   [hh:mm:ss] Header
   [hh:mm:ss] irrelevant text
   [hh:mm:ss] irrelevant text
   [hh:mm:ss]Error text
   [hh:mm:ss] some details
   [hh:mm:ss] end_error;
   [hh:mm:ss] irrelevant text
   [hh:mm:ss] Warning text
   [hh:mm:ss] some details
   [hh:mm:ss] end_warning;
   [hh:mm:ss] irrelevant text
   [hh:mm:ss] irrelevant text
   [hh:mm:ss]Error text
   [hh:mm:ss] some details
   [hh:mm:ss] end_error;

I need to get all occurrences of Error and Warning and capture the following text: 我需要获取所有出现的错误和警告并捕获以下文本:

[hh:mm:ss]Error text
[hh:mm:ss] some details
[hh:mm:ss] end_error;
[hh:mm:ss] Warning text
[hh:mm:ss] some details
[hh:mm:ss] end_warning;
[hh:mm:ss]Error text
[hh:mm:ss] some details
[hh:mm:ss] end_error;

What is the simplest way to achieve this on bash? 在bash上实现此目标的最简单方法是什么?

$ awk '/^(Error|Warning)/{f=1} f; /;/{f=0}' file
Error text
end_error;
Warning text
end_warning;

Your original input file showed Error and Warning at the start of each line so my script above has a start-of-line anchor (^) in it. 您的原始输入文件在每行的开头显示了错误和警告,因此上面的脚本在其中包含行首锚(^)。 Using your latest posted sample input file and desired output you'd need: 使用最新发布的样本输入文件和所需的输出,您需要:

$ awk '
   /^[[:space:]]*\[[^]]+\][[:space:]]*(Error|Warning)/ { found=1 }
   found { sub(/^[[:space:]]+/,""); print }
   /;/ { found=0 }
' file
[hh:mm:ss]Error text
[hh:mm:ss] some details
[hh:mm:ss] end_error;
[hh:mm:ss] Warning text
[hh:mm:ss] some details
[hh:mm:ss] end_warning;
[hh:mm:ss]Error text
[hh:mm:ss] some details
[hh:mm:ss] end_error;

The complexity of the regexp is to avoid false matches if the words Error or Warning appear elsewhere in your input file. 如果输入文件中其他地方出现错误或警告字样,则regexp的复杂性是为了避免错误匹配。

Using GNU sed range operator with -n and -r option to suppress default printing and enabling extended regular expression respectively. GNU sed范围运算符与-n-r选项一起使用可分别禁止默认打印并启用扩展的正则表达式。 p flag prints the line that matches the condition. p标志将打印符合条件的行。

$ sed -nr '/^(Error|Warning)/,/;/p' file
Error text
end_error;
Warning text
end_warning;

You can do the same in awk too. 您也可以在awk执行相同的操作。 But using Ed's approach is almost always recommended. 但是几乎总是建议使用Ed的方法。

$ awk '/^(Error|Warning)/,/;/' file
Error text
end_error;
Warning text
end_warning;

Try: 尝试:

cat file | awk '/^(Error|Warning)/,/;$/ { print $0 }' > output

This will pipe the file through awk, awk will print lines starting with Error or Warning up to the first line ending with ; 这将通过awk传送文件,awk将打印以ErrorWarning开头的行,直到以;结尾的第一行; , the result will be saved on output ,结果将保存在output

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM