简体   繁体   中英

sed matching multiple line pattern

I have a log of following format

<<
[ABC] some other data
some other data
>>

<<
DEF some other data
some other data
>>

<<
[ABC] some other data
some other data
>>

I wanted to select all logs which are having ABC expected result is

<<
[ABC] some other data
some other data
>>

<<
[ABC] some other data
some other data
>>

What will the expression for sed command ? For fetching contents b/w << >> expression will be

sed -e '/<</,/>>/!d' 

But how can I force it to have [ABC] in b/w

This might work for you:

sed '/^<</,/^>>/{/^<</{h;d};H;/^>>/{x;/^<<\n\[ABC\]/p}};d' file
<<
[ABC] some other data
some other data
>>
<<
[ABC] some other data
some other data
>>

sed comes equipped with a register called the hold space (HS).

You can use the HS to collect data of interest. In this case lines between /^<</,/^>>/

h replaces whatever is in the HS with what is in the pattern space (PS)

H appends a newline \\n and then the PS to the HS

x swaps the HS for the PS

NB This deletes all lines other than those between <<...>> containing [ABC] . If you want to retain other lines use:

sed '/^<</,/^>>/{/^<</{h;d};H;/^>>/{x;/^<<\n\[ABC\]/p};d}' file
<<
[ABC] some other data
some other data
>>


<<
[ABC] some other data
some other data
 >>

This works on my side:

awk '$0~/ABC/{print "<<";print;getline;print;getline;print }' temp.txt

tested as below:

pearl.242> cat temp.txt
<< 
[ABC] some other data 
some other data 
>>  
<< 
DEF some other data 
some other data 
>>  

nkeem

<< 
[ABC] some other data 
some other data 
>> 
pearl.243> awk '$0~/ABC/{print "<<";print;getline;print;getline;print }' temp.txt
<<
[ABC] some other data 
some other data 
>>  
<<
[ABC] some other data 
some other data 
>> 
pearl.244> 

If you donot want to hard code this statement print "<<"; ,then you can go for the below:

pearl.249> awk '$0~/ABC/{print x;print;getline;print;getline;print}{x=$0}' temp.txt
<< 
[ABC] some other data 
some other data 
>>  
<< 
[ABC] some other data 
some other data 
>> 
pearl.250> 

To me, sed is line based. You can probably talk it into being multi line, but it would be easier to start the job with awk or perl rather than trying to do it in sed.

I'd use perl and make a little state machine like this pseudo code (I don't guarantee it'll catch every little detail of what you are trying to achieve)

state = 0;
for each line
    if state == 0
        if line == '<<'
            state = 1;
    if state == 1
        If line starts with [ABC]
            buffer += line
            state =2
    if state == 2
      if line == >>
          do something with buffer
          state = 0
      else
          buffer += line;

See also http://www.catonmat.net/blog/awk-one-liners-explained-part-three/ for some hints on how you might do it with awk as a 1 liner...

TXR: built for multi-line stuff.

@(collect)
<<
[ABC] @line1
@line2
>>
@  (output)
>>
[ABC] @line1
@line2
<<

@  (end)
@(end)

Run:

$ txr data.txr  data
>>
[ABC] some other data
some other data
<<

>>
[ABC] some other data
some other data
<<

Very basic stuff; you're probably better off sticking to awk until you have a very complicated multi-line extraction job with irregular data with numerous cases, lots of nesting, etc.

If the log is very large, we should write @(collect :vars ()) so the collect doesn't implicitly accumulate lists; then the job will run in constant memory.

Also, if the logs are not always two lines, it becomes a little more complicated. We can use a nested collect to gather the variable number of lines.

@(collect :vars ())
<<
[ABC] @line1
@  (collect)
@line
@  (until)
>>
@  (end)
@  (output)
>>
[ABC] @line1
@  {line "\n"}
<<

@  (end)
@(end)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM