I have a file that looks like this:
bar
barfo
barfoo
barfooo
barfoooo
sample
sampleText1
sampleText2
sampleText3
prefix
prefixFooBar
prefixBarFoo
What I want sed (or awk) to do is to remove the string which introduces a section, from all of its contents, so that I end up with:
bar
fo
foo
fooo
foooo
sample
Text1
Text2
Text3
prefix
FooBar
BarFoo
I tried using
sed -e -i '/([[:alpha:]]+)/,/^$/ s/\1//g' file
But that fails with "Invalid Backreference".
$ awk '{$0=substr($0,idx)} !idx{idx=length($0)+1} !NF{idx=0} 1' file
bar
fo
foo
fooo
foooo
sample
Text1
Text2
Text3
prefix
FooBar
BarFoo
another awk
$ awk '{sub(pre,"")}1; !NF{pre=""} !pre{pre=$1}' file
bar
fo
foo
fooo
foooo
sample
Text1
Text2
Text3
prefix
FooBar
BarFoo
perl -ple'
if (!length($_)) { $re = "" }
elsif (!length($re)) { $re = $_ }
else { s/^\Q$re// }
'
Notes:
s/\\Q$re//g
to remove anywhere in the line instead of just removing the prefix. \\
, .
and *
. A sed solution, mostly to illustrate that sed is probably not the best choice to do this:
$sed -E '1{h;b};/^$/{n;h;b};G;s/^(.*)(.*)\n\1$/\2/' infile
bar
fo
foo
fooo
foooo
sample
Text1
Text2
Text3
prefix
FooBar
BarFoo
Here is how it works:
1 { # on the first line
h # copy pattern buffer to hold buffer
b # skip to end of cycle
}
/^$/ { # if line is empty
n # get next line into pattern buffer
h # copy pattern buffer to hold buffer
b # skip to end of cycle
}
G # append hold buffer to pattern buffer
s/^(.*)(.*)\n\1$/\2/ # substitute
The complex part is in the substitution. Before the substitution, the pattern buffer holds something like this:
prefixFooBar\nprefix
The substitution now matches two capture groups, the first of which is referenced by what's between \\n
and the end of the string – the prefix we fetched from the hold buffer.
The replacement is then the rest of the original line, with the prefix removed.
Remarks:
-r
instead of -E
-E
is just for convenience; without it, the substitution would look like
s/^\\(.*\\)\\(.*\\)\\n\\1$/\\2/
but still work.
For macOS sed, it works with literal linebreaks between commands:
sed -E '1{ h b } /^$/{ n h b } G s/^(.*)(.*)\\n\\2$/\\2/' infile
Here's another sed
solution. It works only if all strings in a paragraph start with the subject line.
sed -e '1{h;b};/^$/{n;h;b};H;g;s/\(.*\)\n\1//;p;g;s/\n.*//;h;d' file
1
first line: h
copy to hold space, b
print and continue with next line /^$/
empty lines: n
print it and read next line, h
copy to hold space, b
print and continue H
append to hold space with newline g
copy hold space to pattern space s/\\(.*\\)\\n\\1//
remove first line and it's contents in the second line from pattern space p
print pattern space g
copy hold space to pattern space in order to remove the new contents from H
/\\n.*//
remove the new contents h
copy back to hold space d
delete pattern space sed
is not useful for these things.
You get 'Invalid back reference' because there's no group in the search pattern of s
.
Another in awk:
$ awk '{if(p&&match($0,"^" p))$0=substr($0,RLENGTH+1);else p=$0}1' file
Output:
bar
fo
foo
fooo
foooo
sample
Text1
Text2
Text3
prefix
FooBar
BarFoo
Here's another awk solution:
awk '{gsub(s,"")}1; s==""||!NF{s=$0}' file
Pros:
0
/ false
. Cons:
This might work for you (GNU sed):
sed 'G;s/^\(.\+\)\(.*\)\n\1$/\2/;t;s/\n.*//;h' file
Append the previous key (or nothing if it is the first line) to the current line. Remove the key and the previous key if they match, print the current line and repeat. Otherwise the key did not match, remove the old appended key, store the new key in the hold space and print the new key.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.