简体   繁体   English

使用sed / awk从子节中删除字符串

[英]Using sed/awk to remove string from subsections

I have a file that looks like this: 我有一个看起来像这样的文件:

bar
barfo
barfoo
barfooo
barfoooo

sample
sampleText1
sampleText2
sampleText3

prefix
prefixFooBar
prefixBarFoo

What I want sed (or awk) to do is to remove the string which introduces a section, from all of its contents, so that I end up with: 我想要sed(或awk)做的是从其所有内容中删除引入节的字符串,以便我最终得到:

bar
fo
foo
fooo
foooo

sample
Text1
Text2
Text3

prefix
FooBar
BarFoo

I tried using 我试过用

sed -e -i '/([[:alpha:]]+)/,/^$/ s/\1//g' file

But that fails with "Invalid Backreference". 但是,“无效的反向引用”失败了。

$ awk '{$0=substr($0,idx)} !idx{idx=length($0)+1} !NF{idx=0} 1' file
bar
fo
foo
fooo
foooo

sample
Text1
Text2
Text3

prefix
FooBar
BarFoo

another awk 另一个awk

$ awk '{sub(pre,"")}1; !NF{pre=""} !pre{pre=$1}' file

bar
fo
foo
fooo
foooo

sample
Text1
Text2
Text3

prefix
FooBar
BarFoo
perl -ple'
   if (!length($_)) { $re = "" }
   elsif (!length($re)) { $re = $_ }
   else { s/^\Q$re// }
'

Notes: 笔记:

  • Use s/\\Q$re//g to remove anywhere in the line instead of just removing the prefix. 使用s/\\Q$re//g删除行中的任何位置,而不是仅删除前缀。
  • This works even with the header line includes special characters such as \\ , . 即使标题行包含特殊字符,例如\\ , . and * . *
  • This works even if there are multiple blank lines in a row. 即使一行中有多个空行,这也可以工作。
  • See Specifying file to process to Perl one-liner for complete usage. 有关完整用法,请参阅指定要处理到Perl one-liner的文件
  • The line breaks in the code are optional (ie can be removed). 代码中的换行符是可选的(即可以删除)。

A sed solution, mostly to illustrate that sed is probably not the best choice to do this: 一个sed解决方案,主要是为了说明sed可能不是这样做的最佳选择:

$sed -E '1{h;b};/^$/{n;h;b};G;s/^(.*)(.*)\n\1$/\2/' infile
bar
fo
foo
fooo
foooo

sample
Text1
Text2
Text3

prefix
FooBar
BarFoo

Here is how it works: 下面是它的工作原理:

1 {                   # on the first line
  h                   # copy pattern buffer to hold buffer
  b                   # skip to end of cycle
}
/^$/ {                # if line is empty
  n                   # get next line into pattern buffer
  h                   # copy pattern buffer to hold buffer
  b                   # skip to end of cycle
}
G                     # append hold buffer to pattern buffer
s/^(.*)(.*)\n\1$/\2/  # substitute

The complex part is in the substitution. 复杂的部分在于替代。 Before the substitution, the pattern buffer holds something like this: 在替换之前,模式缓冲区包含如下内容:

prefixFooBar\nprefix

The substitution now matches two capture groups, the first of which is referenced by what's between \\n and the end of the string – the prefix we fetched from the hold buffer. 替换现在匹配两个捕获组,第一个是由\\n和字符串结尾之间的内容引用的 - 我们从保持缓冲区中获取的前缀。

The replacement is then the rest of the original line, with the prefix removed. 然后替换为原始行的其余部分,并删除前缀。

Remarks: 备注:

  • This works with GNU sed; 这适用于GNU sed; older GNU sed version might need -r instead of -E 较旧的GNU sed版本可能需要-r而不是-E
  • -E is just for convenience; -E只是为了方便; without it, the substitution would look like 没有它,替换看起来像

     s/^\\(.*\\)\\(.*\\)\\n\\1$/\\2/ 

    but still work. 但仍然有效。

  • For macOS sed, it works with literal linebreaks between commands: 对于macOS sed,它适用于命令之间的文字换行:

     sed -E '1{ h b } /^$/{ n h b } G s/^(.*)(.*)\\n\\2$/\\2/' infile 

Here's another sed solution. 这是另一个sed解决方案。 It works only if all strings in a paragraph start with the subject line. 仅当段落中的所有字符串都以主题行开头时 ,它才有效。

sed -e '1{h;b};/^$/{n;h;b};H;g;s/\(.*\)\n\1//;p;g;s/\n.*//;h;d' file
  • 1 first line: h copy to hold space, b print and continue with next line 1第一行: h复制以保留空格, b打印并继续下一行
  • /^$/ empty lines: n print it and read next line, h copy to hold space, b print and continue /^$/空行: n打印并读取下一行, h复制以保存空格, b打印并继续
  • all (the other) lines: 所有(其他)行:
    • H append to hold space with newline H附加换行符
    • g copy hold space to pattern space g复制持有空间以模式空间
    • s/\\(.*\\)\\n\\1// remove first line and it's contents in the second line from pattern space s/\\(.*\\)\\n\\1//从模式空间中删除第一行及其第二行中的内容
    • p print pattern space p打印图案空间
    • g copy hold space to pattern space in order to remove the new contents from H g复制保留空间以模式空间以从H删除新内容
    • /\\n.*// remove the new contents /\\n.*//删除新内容
    • h copy back to hold space h复制回来占据空间
    • d delete pattern space d删除模式空间

sed is not useful for these things. sed对这些东西没用。

You get 'Invalid back reference' because there's no group in the search pattern of s . 你得到'无效的后向引用',因为s的搜索模式中没有组。

Another in awk: awk中的另一个:

$ awk '{if(p&&match($0,"^" p))$0=substr($0,RLENGTH+1);else p=$0}1' file

Output: 输出:

bar
fo
foo
fooo
foooo

sample
Text1
Text2
Text3

prefix
FooBar
BarFoo

Here's another awk solution: 这是另一个awk解决方案:

awk '{gsub(s,"")}1; s==""||!NF{s=$0}' file

Pros: 优点:

  • Matches are replaced, wherever they are 匹配被替换,无论他们在哪里
  • All matches are replaced 所有比赛都被替换
  • Head line may evaluate to 0 / false . 头线可以评估为0 / false
  • Head line may contain whitespace 头线可能包含空格

Cons: 缺点:

  • Head line must not contain regular expression meta chars 头行不得包含正则表达式元字符

This might work for you (GNU sed): 这可能适合你(GNU sed):

sed 'G;s/^\(.\+\)\(.*\)\n\1$/\2/;t;s/\n.*//;h' file

Append the previous key (or nothing if it is the first line) to the current line. 将前一个键(如果它是第一行,则没有任何内容)附加到当前行。 Remove the key and the previous key if they match, print the current line and repeat. 如果它们匹配,则移除键和上一个键,打印当前行并重复。 Otherwise the key did not match, remove the old appended key, store the new key in the hold space and print the new key. 否则密钥不匹配,删除旧的附加密钥,将新密钥存储在保留空间中并打印新密钥。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM