简体   繁体   English

如何使用第一行的一个模式和所有后续行的另一个模式(最好使用sed)连接几条连续的行?

[英]How to join several consecutive lines using one pattern for the first line and the other pattern for all following lines, preferably with sed?

I want to remove from this example the whole section "Derived words", both of them. 我想从此示例中删除整个部分“派生词”,两个都包括。 So far I have come up with an idea of joining lines that follow the line "Derived words:" with that line and removing it, but I can't just join two following lines, the number of lines may differ from article to article. 到目前为止,我已经想到了将“派生单词:”行之后的行与该行连接并删除的想法,但是我不能仅将以下两行连接起来,每条文章的行数可能会有所不同。 So, my thoughts are check if line matches the pattern '^Derived words:' then check if next line matches the pattern '^[az] ' if true, join together, check next line... Sounds like the job is perfectly tailored for Bash's if-then-else but I'd prefer a pure sed solution if possible. 因此,我的想法是检查行是否与模式'^ Derived words:'相匹配,然后检查下一行是否与模式'^ [az]'相匹配,如果为true,请合并在一起,检查下一行...听起来工作很完美对于Bash的if-then-else,但如果可能的话,我更喜欢纯sed解决方案。

A swift event or process happens very quickly or without delay.
Our task is to challenge the UN to make a swift decision... 
The police were swift to act. 
Syn:
quick
Derived words:
swiftly  The French have acted swiftly and decisively to protect their industries. 
swiftness  The secrecy and swiftness of the invasion shocked and amazed army officers. 
  Something that is swift moves very quickly.
With a swift movement, Matthew Jerrold sat upright. 
Syn:
quick
Derived words:
swiftly  ^[[0;37m...a swiftly flowing stream. 
swiftness  With incredible swiftness she ran down the passage. 
  A swift is a small bird with long curved wings.

Expected results 预期成绩

A swift event or process happens very quickly or without delay.
Our task is to challenge the UN to make a swift decision... 
The police were swift to act. 
Syn:
quick
  Something that is swift moves very quickly.
With a swift movement, Matthew Jerrold sat upright. 
Syn:
quick
  A swift is a small bird with long curved wings.

Thanks in advance 提前致谢

This might work for you (GNU sed): 这可能对您有用(GNU sed):

sed -n '/^Derived words:/{:a;n;/^\w/ba};p' file

Use seds grep-like flag -n and when encountering Derived words: keep reading until a non-word is matched at the start of a line. 在遇到Derived words:时使用seds grep-like标志-n Derived words:继续阅读,直到非词在行首匹配为止。

I find that when you want to work on blocks of many lines, the best tool tends to be awk, for example: 我发现当您要处理多行块时,最好的工具往往是awk,例如:

awk '/^Derived words/{skip=1} /^ /{skip=0} 1{if(!skip)print}' input

A swift event or process happens very quickly or without delay.
Our task is to challenge the UN to make a swift decision...
The police were swift to act.
Syn:
quick
  Something that is swift moves very quickly.
With a swift movement, Matthew Jerrold sat upright.
Syn:
quick
  A swift is a small bird with long curved wings.

This should work in regular (non-GNU) sed. 这应该在常规(非GNU)sed中工作。 There may be a way to eliminate the redundant pattern, but I haven't come up with it yet. 可能有一种消除冗余模式的方法,但是我还没有提出。

sed -e :a -e '/^Derived words:/N;s/\n[a-z]//;ta' -e 's/^Derived words:.*\n//'

Here's how it works: 运作方式如下:

  • You said that you want to remove "Derived words:" and any lines that follow it if they start with a letter (let's call those continuation lines). 您说过要删除“派生单词:”及其后跟的任何以字母开头的行(我们将其称为连续行)。
  • So sed reads the input and echoes it to stdout, line by line, as usual. 因此,sed会像往常一样逐行读取输入并将其回显到stdout。
  • But when it encounters "Derived words:" at the start of a line, before echoing it, it reads the next line into the pattern space and appends to "Derived words:", with a newline separating them (the N command), still echoing nothing since it saw "Derived words:". 但是,当它在一行的开头遇到“派生词:”时,在回显之前,它会将下一行读入模式空间并追加到“派生词:”中,并用换行符将它们分开(N命令),自从看到“派生词:”以来,没有回声。 It then tries to delete that newline and the alphabetical character immediately following it (the s command). 然后,它尝试删除该换行符和紧随其后的字母字符 (s命令)。

    • If it can, then it must have found a continuation line, so it tries to do that again, by jumping to the start of the script (the t command, which conditionally jumps to the label "a" defined up front with the colon command) where it will append the next line and so on. 如果可以,那么它一定已经找到了一条连续行,因此它尝试通过跳到脚本的开头(t命令,该命令有条件地跳到用冒号命令在前面定义的标签“ a”)来再次执行该操作。 ),它将在其下一行追加,依此类推。
    • If it can't, it's left with the "Derived words:" line plus any continuation lines appended (without their newlines, which were removed) plus the next non-continuation line, which is separated from the rest by a newline. 如果不能,则留下“派生词:”行,并附加任何继续行(不删除其换行符), 再加上下一个非继续行,该行与换行符与其余行分开。
  • If it then sees that it has a line that starts with "Derived words:", it deletes it up to and including the newline (the second s command) -- leaving the part that follows the newline, the next non-continuation line -- which it echoes. 如果随后发现一行以“ Derived words:”开头的行,则将其删除直到换行符并包含换行符(第二个s命令)-保留换行符之后的部分,即下一条非连续行- -呼应。 Then it resumes processing the input with the next line. 然后,它继续处理下一行的输入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM