简体   繁体   English

在bash脚本中提取文件的两个表达式之间的行(使用regexp,sed)

[英]Extract lines between two expressions of a file inside bash script (using regexp, sed)

I've a log file with many lines, I've to extract lines from session start to session end using a bash script, for further analysis. 我有一个包含很多行的日志文件,我必须使用bash脚本从会话开始到会话结束提取行,以进行进一步分析。

...
...

## TSM-INSTALL SESSION (pid) started at yyyy/mm/dd hh:mm:ss for host (variable) ##
...
...
...
...
...
...
...
## TSM-INSTALL SESSION (pid) ended at yyyy/mm/dd hh:mm:ss for host (variable) ##

...
...

I've googled and found a sed expression to extract the lines 我用谷歌搜索并找到了一个sed表达式来提取行

sed '/start_pattern_here/,/end_pattern_here/!d' inputfile

But I'm unable to find a correct reg expression pattern to extract the info. 但是我找不到正确的reg表达式模式来提取信息。

I'm pretty novice to reg exp. 我是reg exp的新手。 I'm also adding all the expressions (silly ones too) I've tried inside the script. 我还添加了我在脚本中尝试过的所有表达式(也包括愚蠢的表达式)。

sed '/\.* started at \.* $server ##/,/\.* ended at \.* $server ##/!d' file

sed '/## TSM-INSTALL SESSION [0-9]\+ started at [0-9|\\|:]\+ for host $server ##/,/## TSM-INSTALL SESSION [0-9]\+ ended at [0-9|\\|:]\+ for host $server ##/!d' file

sed '/.\{30\}started{34\}$server ##$/,/.\{30\}ended{34\}$server ##$/!d' file

sed '/.## TSM-INSTALL SESSION\{6\}started at\{31\}$server ##$/,/.## TSM-INSTALL SESSION\{6\}ended at\{31\}$server ##$/!d' file

sed '/## TSM-INSTALL SESSION [0-9]+ started at .* $server/,/## TSM-INSTALL SESSION [0-9]+ ended at .* $server/!d' file

sed '/## TSM-INSTALL SESSION \.\.\.\.\. started at \.\.\.\.\.\.\.\.\.\. \.\.\.\.\.\.\.\. for host $server ##/,/## TSM-INSTALL SESSION \.\.\.\.\. ended at \.\.\.\.\.\.\.\.\.\. \.\.\.\.\.\.\.\. for host $server ##/!d' file

Why not: 为什么不:

$(sed "/^## TSM-INSTALL SESSION .* started .* $server ##/,/^## TSM-INSTALL SESSION .* ended .* $server ##/!d" file)

You don't need to get fancy with the regexps. 您不需要看上正则表达式。 All you care about is the leading TSM-INSTALL SESSION , the started or ended , and the hostname, so use .* to mean "whatever in-between". 您所关心的只是领先的TSM-INSTALL SESSIONstartedended以及主机名,因此使用.*表示“介于两者之间”。

If you stick this in a file called file.sed 如果将其粘贴在名为file.sed的文件中

/^## TSM-INSTALL SESSION ([0-9][0-9]*) started at [0-9][0-9]*\/[0-9][0-9]\/[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] or host ([^)]*) ##/,/^## TSM-INSTALL SESSION ([0-9][0-9]*) ended at [0-9][0-9]*\/[0-9][0-9]\/[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] or host ([^)]*) ##/p

and then call it like 然后像这样称呼它

sed -n -f file.sed inputfile 

I think it will do what you want. 我认为它将满足您的要求。

The -n makes sed not print, so only the lines matched by expression will get printed. -n使sed 打印,因此仅打印与表达式匹配的行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM