[英]Using sed/awk to remove string from subsections
I have a file that looks like this: 我有一个看起来像这样的文件:
bar
barfo
barfoo
barfooo
barfoooo
sample
sampleText1
sampleText2
sampleText3
prefix
prefixFooBar
prefixBarFoo
What I want sed (or awk) to do is to remove the string which introduces a section, from all of its contents, so that I end up with: 我想要sed(或awk)做的是从其所有内容中删除引入节的字符串,以便我最终得到:
bar
fo
foo
fooo
foooo
sample
Text1
Text2
Text3
prefix
FooBar
BarFoo
I tried using 我试过用
sed -e -i '/([[:alpha:]]+)/,/^$/ s/\1//g' file
But that fails with "Invalid Backreference". 但是,“无效的反向引用”失败了。
$ awk '{$0=substr($0,idx)} !idx{idx=length($0)+1} !NF{idx=0} 1' file
bar
fo
foo
fooo
foooo
sample
Text1
Text2
Text3
prefix
FooBar
BarFoo
another awk
另一个
awk
$ awk '{sub(pre,"")}1; !NF{pre=""} !pre{pre=$1}' file
bar
fo
foo
fooo
foooo
sample
Text1
Text2
Text3
prefix
FooBar
BarFoo
perl -ple'
if (!length($_)) { $re = "" }
elsif (!length($re)) { $re = $_ }
else { s/^\Q$re// }
'
Notes: 笔记:
s/\\Q$re//g
to remove anywhere in the line instead of just removing the prefix. s/\\Q$re//g
删除行中的任何位置,而不是仅删除前缀。 \\
, .
\\
, .
and *
. *
。 A sed solution, mostly to illustrate that sed is probably not the best choice to do this: 一个sed解决方案,主要是为了说明sed可能不是这样做的最佳选择:
$sed -E '1{h;b};/^$/{n;h;b};G;s/^(.*)(.*)\n\1$/\2/' infile
bar
fo
foo
fooo
foooo
sample
Text1
Text2
Text3
prefix
FooBar
BarFoo
Here is how it works: 下面是它的工作原理:
1 { # on the first line
h # copy pattern buffer to hold buffer
b # skip to end of cycle
}
/^$/ { # if line is empty
n # get next line into pattern buffer
h # copy pattern buffer to hold buffer
b # skip to end of cycle
}
G # append hold buffer to pattern buffer
s/^(.*)(.*)\n\1$/\2/ # substitute
The complex part is in the substitution. 复杂的部分在于替代。 Before the substitution, the pattern buffer holds something like this:
在替换之前,模式缓冲区包含如下内容:
prefixFooBar\nprefix
The substitution now matches two capture groups, the first of which is referenced by what's between \\n
and the end of the string – the prefix we fetched from the hold buffer. 替换现在匹配两个捕获组,第一个是由
\\n
和字符串结尾之间的内容引用的 - 我们从保持缓冲区中获取的前缀。
The replacement is then the rest of the original line, with the prefix removed. 然后替换为原始行的其余部分,并删除前缀。
Remarks: 备注:
-r
instead of -E
-r
而不是-E
-E
is just for convenience; -E
只是为了方便; without it, the substitution would look like 没有它,替换看起来像
s/^\\(.*\\)\\(.*\\)\\n\\1$/\\2/
but still work. 但仍然有效。
For macOS sed, it works with literal linebreaks between commands: 对于macOS sed,它适用于命令之间的文字换行:
sed -E '1{ h b } /^$/{ n h b } G s/^(.*)(.*)\\n\\2$/\\2/' infile
Here's another sed
solution. 这是另一个
sed
解决方案。 It works only if all strings in a paragraph start with the subject line. 仅当段落中的所有字符串都以主题行开头时 ,它才有效。
sed -e '1{h;b};/^$/{n;h;b};H;g;s/\(.*\)\n\1//;p;g;s/\n.*//;h;d' file
1
first line: h
copy to hold space, b
print and continue with next line 1
第一行: h
复制以保留空格, b
打印并继续下一行 /^$/
empty lines: n
print it and read next line, h
copy to hold space, b
print and continue /^$/
空行: n
打印并读取下一行, h
复制以保存空格, b
打印并继续 H
append to hold space with newline H
附加换行符 g
copy hold space to pattern space g
复制持有空间以模式空间 s/\\(.*\\)\\n\\1//
remove first line and it's contents in the second line from pattern space s/\\(.*\\)\\n\\1//
从模式空间中删除第一行及其第二行中的内容 p
print pattern space p
打印图案空间 g
copy hold space to pattern space in order to remove the new contents from H
g
复制保留空间以模式空间以从H
删除新内容 /\\n.*//
remove the new contents /\\n.*//
删除新内容 h
copy back to hold space h
复制回来占据空间 d
delete pattern space d
删除模式空间 sed
is not useful for these things. sed
对这些东西没用。
You get 'Invalid back reference' because there's no group in the search pattern of s
. 你得到'无效的后向引用',因为
s
的搜索模式中没有组。
Another in awk: awk中的另一个:
$ awk '{if(p&&match($0,"^" p))$0=substr($0,RLENGTH+1);else p=$0}1' file
Output: 输出:
bar
fo
foo
fooo
foooo
sample
Text1
Text2
Text3
prefix
FooBar
BarFoo
Here's another awk solution: 这是另一个awk解决方案:
awk '{gsub(s,"")}1; s==""||!NF{s=$0}' file
Pros: 优点:
0
/ false
. 0
/ false
。 Cons: 缺点:
This might work for you (GNU sed): 这可能适合你(GNU sed):
sed 'G;s/^\(.\+\)\(.*\)\n\1$/\2/;t;s/\n.*//;h' file
Append the previous key (or nothing if it is the first line) to the current line. 将前一个键(如果它是第一行,则没有任何内容)附加到当前行。 Remove the key and the previous key if they match, print the current line and repeat.
如果它们匹配,则移除键和上一个键,打印当前行并重复。 Otherwise the key did not match, remove the old appended key, store the new key in the hold space and print the new key.
否则密钥不匹配,删除旧的附加密钥,将新密钥存储在保留空间中并打印新密钥。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.