简体   繁体   中英

Using sed/awk to remove string from subsections

I have a file that looks like this:

bar
barfo
barfoo
barfooo
barfoooo

sample
sampleText1
sampleText2
sampleText3

prefix
prefixFooBar
prefixBarFoo

What I want sed (or awk) to do is to remove the string which introduces a section, from all of its contents, so that I end up with:

bar
fo
foo
fooo
foooo

sample
Text1
Text2
Text3

prefix
FooBar
BarFoo

I tried using

sed -e -i '/([[:alpha:]]+)/,/^$/ s/\1//g' file

But that fails with "Invalid Backreference".

$ awk '{$0=substr($0,idx)} !idx{idx=length($0)+1} !NF{idx=0} 1' file
bar
fo
foo
fooo
foooo

sample
Text1
Text2
Text3

prefix
FooBar
BarFoo

another awk

$ awk '{sub(pre,"")}1; !NF{pre=""} !pre{pre=$1}' file

bar
fo
foo
fooo
foooo

sample
Text1
Text2
Text3

prefix
FooBar
BarFoo
perl -ple'
   if (!length($_)) { $re = "" }
   elsif (!length($re)) { $re = $_ }
   else { s/^\Q$re// }
'

Notes:

  • Use s/\\Q$re//g to remove anywhere in the line instead of just removing the prefix.
  • This works even with the header line includes special characters such as \\ , . and * .
  • This works even if there are multiple blank lines in a row.
  • See Specifying file to process to Perl one-liner for complete usage.
  • The line breaks in the code are optional (ie can be removed).

A sed solution, mostly to illustrate that sed is probably not the best choice to do this:

$sed -E '1{h;b};/^$/{n;h;b};G;s/^(.*)(.*)\n\1$/\2/' infile
bar
fo
foo
fooo
foooo

sample
Text1
Text2
Text3

prefix
FooBar
BarFoo

Here is how it works:

1 {                   # on the first line
  h                   # copy pattern buffer to hold buffer
  b                   # skip to end of cycle
}
/^$/ {                # if line is empty
  n                   # get next line into pattern buffer
  h                   # copy pattern buffer to hold buffer
  b                   # skip to end of cycle
}
G                     # append hold buffer to pattern buffer
s/^(.*)(.*)\n\1$/\2/  # substitute

The complex part is in the substitution. Before the substitution, the pattern buffer holds something like this:

prefixFooBar\nprefix

The substitution now matches two capture groups, the first of which is referenced by what's between \\n and the end of the string – the prefix we fetched from the hold buffer.

The replacement is then the rest of the original line, with the prefix removed.

Remarks:

  • This works with GNU sed; older GNU sed version might need -r instead of -E
  • -E is just for convenience; without it, the substitution would look like

     s/^\\(.*\\)\\(.*\\)\\n\\1$/\\2/ 

    but still work.

  • For macOS sed, it works with literal linebreaks between commands:

     sed -E '1{ h b } /^$/{ n h b } G s/^(.*)(.*)\\n\\2$/\\2/' infile 

Here's another sed solution. It works only if all strings in a paragraph start with the subject line.

sed -e '1{h;b};/^$/{n;h;b};H;g;s/\(.*\)\n\1//;p;g;s/\n.*//;h;d' file
  • 1 first line: h copy to hold space, b print and continue with next line
  • /^$/ empty lines: n print it and read next line, h copy to hold space, b print and continue
  • all (the other) lines:
    • H append to hold space with newline
    • g copy hold space to pattern space
    • s/\\(.*\\)\\n\\1// remove first line and it's contents in the second line from pattern space
    • p print pattern space
    • g copy hold space to pattern space in order to remove the new contents from H
    • /\\n.*// remove the new contents
    • h copy back to hold space
    • d delete pattern space

sed is not useful for these things.

You get 'Invalid back reference' because there's no group in the search pattern of s .

Another in awk:

$ awk '{if(p&&match($0,"^" p))$0=substr($0,RLENGTH+1);else p=$0}1' file

Output:

bar
fo
foo
fooo
foooo

sample
Text1
Text2
Text3

prefix
FooBar
BarFoo

Here's another awk solution:

awk '{gsub(s,"")}1; s==""||!NF{s=$0}' file

Pros:

  • Matches are replaced, wherever they are
  • All matches are replaced
  • Head line may evaluate to 0 / false .
  • Head line may contain whitespace

Cons:

  • Head line must not contain regular expression meta chars

This might work for you (GNU sed):

sed 'G;s/^\(.\+\)\(.*\)\n\1$/\2/;t;s/\n.*//;h' file

Append the previous key (or nothing if it is the first line) to the current line. Remove the key and the previous key if they match, print the current line and repeat. Otherwise the key did not match, remove the old appended key, store the new key in the hold space and print the new key.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM