简体   繁体   English

SED,删除图案之间的线

[英]SED, deleting lines between the patterns

This is regarding deleting the lines between the pattern excluding the lines with pattern using sed . 这与使用sed删除模式之间的行(带有模式的行除外)有关。

If the second pattern appears twice or more often, I want the lines to be deleted until the last occurrence of the second pattern. 如果第二个模式出现两次或更多次,我希望删除这些行,直到最后一次出现第二个模式。

How would I do that? 我该怎么做?

The main thing to realize is that sed operates on individual lines, not on the whole file at once, which means that without special treatment it cannot obtain multi-line matches from a regex. 主要要认识到的是sed只能在单独的行上运行,而不是一次在整个文件上运行,这意味着,如果没有特殊处理,它就不能从正则表达式中获得多行匹配。 In order to operate on the whole file at once, you first have to read the whole file into memory. 为了立即处理整个文件,您首先必须将整个文件读入内存。 There are many ways to do this; 有很多方法可以做到这一点。 one of them is 其中之一是

sed '1h; 1!H; $!d; x; s/regex/replacement/' filename

This works as follows: 其工作原理如下:

1h   # When processing the first line, copy it to the hold buffer.
1!H  # When processing a line that's not the first, append it to the hold buffer.
$!d  # When processing a line that's not the last, stop working here.
x    # If we get here, we just appended the last line to the hold buffer, so
     # swap hold buffer and pattern space. Now the whole file is in the pattern
     # space, where we can apply regexes to it.

I like to use this one because it doesn't involve jump labels. 我喜欢使用它,因为它不涉及跳转标签。 Some seds (notably BSD sed, as comes with *BSD and MacOS X) are a bit prissy when those are involved. 当涉及到某些sed(尤其是BSD sed,如* BSD和MacOS X)时,它们有些麻烦。

So, all that's left is to formulate a multi-line regex. 因此,剩下的就是制定多行正则表达式。 Since you didn't specify the delimiter patterns, let me assume that you want to remove lines between the first line that contains START and the last line that contains END . 由于您未指定定界符模式,因此让我假设您要删除包含START的第一行和包含END的最后一行之间的行。 This could be done with 这可以用

sed '1h; 1!H; $!d; x; s/\(START[^\n]*\).*\(\n[^\n]*END\)/\1\2/' filename

The regex does not contain anything spectacular; 正则表达式不包含任何引人注目的内容; mainly you have to be careful to use [^\\n] in the right places to avoid greedily matching beyond the end of a line. 通常,您必须注意在正确的位置使用[^\\n] ,以避免贪婪地匹配行尾之外的内容。

Note that this will only work as long as the file is small enough to be read completely into memory. 请注意,这仅在文件足够小以至于可以完全读入内存时才起作用。 If this is not the case, my suggestion is to make two passes over the file with awk: 如果不是这种情况,我的建议是使用awk在文件上进行两次传递:

awk 'NR == FNR && /START/ && !start { start = NR } NR == FNR && /END/ { end = NR } NR != FNR && (FNR <= start || FNR >= end)' filename filename

This works as follows: since filename is passed to awk twice, awk will process the file twice. 它的工作方式如下:由于filename两次传递给awk ,因此awk将处理该文件两次。 NR is the overall record (line, by default) count, FNR the number of records read so far from the current file. NR是总记录(默认为行)数, FNR是到目前为止从当前文件读取的记录数。 In the first pass over the file, NR and FNR are equal, after that they're not. 在文件的第一遍中, NRFNR相等,但之后不相等。 So: 所以:

# If this is the first pass over the file, the line matches the start pattern,
# and the start marker hasn't been set yet, set the start marker
NR == FNR && /START/ && !start { start = NR }

# If this is the first pass over the file and the line matches the end line,
# set the end marker to the current line (this means that the end marker will
# always identify the last occurrence of the end pattern that was seen so far)
NR == FNR && /END/             { end   = NR }

# In the second pass, print those lines whose number is less than or equal to
# the start marker or greater than or equal to the end marker.
NR != FNR && (FNR <= start || FNR >= end)

To follow up on Wintermute's answer, if you've found a block that does match, you can delete it along the way, so you don't have to keep the entire file in memory: 要跟踪Wintermute的答案,如果您找到了一个匹配的块,则可以一路删除它,这样就不必将整个文件保留在内存中:

sed '/^START$/{:a;N;/.*\nEND$/d;ba}'

(sorry, would have replied to Wintermute's answer, but apparently I still need 50 reputation points for that privilege) (对不起,我会回答温特姆特的回答,但显然,我仍然需要50点声望才能获得该特权)

No example input, so guessing an example file and patterns /line3/ and /line6/. 没有示例输入,因此猜测示例文件和模式/ line3 /和/ line6 /。

line1 #keep - up to 1st pattern line3 - including
line2 #keep
line3 #keep
line4 #delete up to last occurence of line6
line5
line6a
line7
line6b
line8 #delete
line6c #keep - the last line6
line9  #keep
line10 #keep

without any dark voo-doo, but inefficient method could be: 没有任何黑暗的voo-doo,但是效率低下的方法可能是:

(sed -n '1,/line3/p' file; tail -r file | sed -n '1,/line6/p' | tail -r) > file2

the file2 will contain: file2将包含:

line1
line2
line3
line6c
line9
line10

explanation: 说明:

sed -n '1,/line3/p' file; # prints line 1 up to pattern (included)

tail -r file | sed -n '1,/line6/p' | tail -r
#reverse the file
#print the lines up to pattern2
#reverse the result

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM