简体   繁体   中英

sed: print delimited block of lines if it matches a pattern

I'd like to use sed to match blocks of lines delimited by pattern1/pattern2 , and then perform operations (eg print the block) only on blocks which contain pattern3 .

In the example below, I'm looking for " catch me if you can ", inside all blocks delimited by lines matching { and } (and then I want to print the matching blocks in their entirety).

What I've tried:

sed -n -e '/{/,/}/{1h;1!{$!{H;d};H;x;/catch me if you can/p}}'

(The idea is to match blocks delimited by { and } , then accumulate each block into the hold space; at the end of each block, exchange the hold space and perform matching for " catch me if you can "). This doesn't work, because all matched blocks together are treated as a single block by sed, instead of each block being treated individually.

Input data :

"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block2": {
    "bbb": "24680",
    "bar": "blah",
    "foo": "argh",
    "ccc": "135"
},
"block3": {
    "ddd": "zzz"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}

Desired output :

"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can"
},

Note 1: The order of the fields inside each block is random. The number of fields and the length of the values are not constant across blocks. The field I'm looking for may be missing in some blocks (as opposed to just having a different value).

Note 2: For educational purposes, I'd prefer the solution to use sed , but if that's not possible, awk or bash are fine as well. Please no perl or other tools.

References:

  1. Sed command summary
  2. Sed one liners

This is how I'd do it. There are two versions here, one for BSD (Mac OS X) sed (also applicable to other systems not running GNU sed ), and one for GNU sed .

BSD sed

$ cat script.bsd-sed
/{/,/}/{
    /{/{ h; b next
    }
    /}/{ H; x; /catch me if you can/p; b next
    }
    H
    :next
}
$ sed -n -f script.bsd-sed data
"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}
$

The logic is:

  • Don't print anything unless told to do so ( -n ).
  • Between lines containing { and }
  • If the line matches { , copy the pattern over the hold space and jump to label next .
  • If the line matches } , add it to the hold space; switch the pattern and hold space; if the pattern space (previously hold space) matches your other pattern ('catch me if you can'), print it; jump to label next .
  • Add the line to the hold space.

BSD (classic) sed requires nothing on the line after b next , so the } for the actions are on the next line.

GNU sed

$ cat script.gnu-sed 
/{/,/}/{
    /{/{ h; b next }
    /}/{ H; x; /catch me if you can/p; b next }
    H
    :next
}
$ /opt/gnu/bin/sed -n -f script.gnu-sed data
"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}
$

GNU sed recognizes semicolons or close braces after the label as terminating the command, so it allows more compact notation. You could even flatten it all into a single line — you have to add a few semicolons:

$ /opt/gnu/bin/sed -n -e '/{/,/}/{ /{/{ h; b next }; /}/{ H; x; /catch me if you can/p; b next }; H; :next }' data
"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}
$

You can remove the spaces not in the pattern match too:

$ /opt/gnu/bin/sed -n -e '/{/,/}/{/{/{ h;b next};/}/{H;x;/catch me if you can/p;b next};H;:next}' data
"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}
$

Extended data file data

"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block2": {
    "bbb": "24680",
    "bar": "blah",
    "foo": "argh",
    "ccc": "135"
},
"block3": {
    "ddd": "zzz"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}
"block5": [
    "oops": "catch me if you can"
],
"block6": {
    "rhubarb": "dandelion"
}

Using sed

$ sed -n '/^"/{x;/catch/p;d}; ${H;x;/catch/p;d}; H' file
"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}

How it works

  • -n

    This option tells sed not to print anything unless we ask

  • /^"/{x;/catch/p;d}

    For any line that begins with a quote, this (1) exchanges the pattern and hold space, (2) checks to see if what is now in the pattern space has catch in it and, if so, prints it, and (3) deletes the pattern space and sed starts over working on the next line.

  • ${H;x;/catch/p;d}

    When we reach the last line, we do something similar. We add the last line to the hold space, swap the hold space into the pattern space, check to see if it contains catch and, if so, prints it. Then the pattern space is deleted.

  • H

    For any other case, the line is appended to the hold space.

Using awk

$ awk '/catch/{print $0 "},"}' RS='}' file
"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
,
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
},

Improvements

Jonathan Leffler adds the possibility of square bracket blocks in addition to curly brace blocks as shown in his test file data . In that case for sed, try:

$ sed -n '/^"/{x;/{.*catch/p;d}; ${H;x;/{.*catch/p;d}; H' data
"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}

And for awk:

$ awk '{s=(s?s"\n":"") $0} /{/{f=1} f && /catch/{f=2} /^[]}]/{if (f==2) print s; f=0; s=""} ' data
"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}

sed is for simple substitutions on individual lines, that is all . All of its constructs to do more than s, g, and p (with -n) literally became obsolete over 40 years ago when awk was invented.

With GNU awk for multi-char RS and RT:

$ awk -v RS='},?\n' -v ORS= '/catch me if you can/{print $0 RT}' file
"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM