简体   繁体   中英

Print several lines between patterns (first pattern not unique)

Need help with sed/awk/grep/whatever could solve my task. I have a large file and I need to extract multiple sequential lines from it.

I have start pattern: <DN>

and end pattern: </GR>

and several lines in between, like this:

<DN>234</DN>
<DD>sdfsd</DD>
<BR>456456</BR>
<COL>6575675 sdfsd</COL>

<RAC>456464</RAC>
<GR>sdfsdfsFFFDd</GR>

I've tried this:

sed -n '/\<DN\>/,/\<\/GR\>/p'

and several other ones (using awk and sed). It works okay, but the problem is that the source file may contain lines starting with <DN> and without </GR> in the end of the bunch of lines, and then starts a part with another and normal in the end:

<DN>234</DN> - unneded DN
<AB>sdfsd</AB>
<DC>456456</DC>
<EF>6575675 sdfsd</EF>
....really large piece of unwanted text here....

<DN>234</DN>
<DD>sdfsd</DD>
<BR>456456</BR>
<COL>6575675 sdfsd</COL>

<RAC>456464</RAC>
<GR>sdfsdfsFFFDd</GR>
<RAC>456464</RAC>
<GR>sdfsdfsFFFDd</GR>

How can I extract only needed lines and ignore garbage pieces of log, containing <DN> without ending </GR> ?

And next, I need to convert a multiline pieces from <DN> to </GR> to a file with single lines, starting with <DN> and ending with </GR> . Any help would be appreciated. I'm stuck

This might work for you (GNU sed):

sed -n '/<DN>/{h;b};x;/./G;x;/<\/GR/{x;/./p;z;x}' file

Use the hold space to store lines between <DN> and </GR> .

awk '
# Lines that start with '<DN>' start our matching.
/^<DN>/ {
    # If we saw a start without a matching end throw everything we've saved away.
    if (dn) {
        d=""
    }
    # Mark being in a '<DN>' element.
    dn=1
    # Save the current line.
    d=$0
    next
}

# Lines that end with '</GR>$' end our matching (but only if we are currently in a match).
dn && /<\/GR>$/ {
    # We aren't in a <DN> element anymore.
    dn=0
    # Print out the lines we've saved and the current line.
    printf "%s%s%s\n", d, OFS, $0
    # Reset our saved contents.
    d=""
    next
}

# If we are in a <DN> element and have saved contents append the current line to the contents (separated by OFS).
dn && d {
    d=d OFS $0
}
' file
awk '
  /^<DN>/ { n = 1 }

  n { lines[n++] = $0 }

  n && /<\/GR>$/ {
    for (i=1; i<n; i++) printf "%s", lines[i]
    print ""
    n = 0
  }
' file

with bash:

fun () 
{ 
    local line output;
    while IFS= read -r line; do
        if [[ $line =~ ^'<DN>' ]]; then
            output=$line;
        else
            if [[ -n $output ]]; then
                output=$output$'\n'$line;
                if [[ $line =~ '</GR>'$ ]]; then
                    echo "$output";
                    output=;
                fi;
            fi;
        fi;
    done
}

fun <file

You could use pcregrep tool for this.

$ pcregrep -o -M '(?s)(?<=^|\s)<DN>(?:(?!<DN>).)*?</GR>(?=\n|$)' file
<DN>234</DN>
<DD>sdfsd</DD>
<BR>456456</BR>
<COL>6575675 sdfsd</COL>

<RAC>456464</RAC>
<GR>sdfsdfsFFFDd</GR>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM