简体   繁体   English

Sed regexp multiline - 替换HTML

[英]Sed regexp multiline - replace HTML

I am attempting to replace multiple lines using sed on a Linux system 我试图在Linux系统上使用sed替换多行

Here is my file 这是我的档案

<!-- PAGE TAG -->
DATA1
DATA2
DATA3
DATA4
DATA5
DATA6
<div id="DATA"></div>
DATA8
DATA9
<!-- PAGE TAG -->

The attempts I have made and failed! 我的尝试和失败!

sed -n '1h;1!H;${;g;s/<!-- PAGE TAG -->.*<!-- PAGE TAG -->//g;p;}' 
sed -n '1!N; s/<!-- PAGE TAG -->.*<!-- PAGE TAG -->// p'
sed -i 's|<!--[^>]*-->[^+]+<!--[^>]*-->||g' 
sed -i 's|/\/\/<!-- PAGE TA -->/,/\/\/<!-- PAGE TA -->||g'

Everything in between <!-- PAGE TAG --> should be replaced. 应该替换<!-- PAGE TAG -->之间的所有内容。

This question is similar sed multiline replace 这个问题类似于sed multiline replace

Adapting from the answer given in the link you see, this should work: 根据您看到的链接中给出的答案进行调整,这应该有效:

sed '/<!-- PAGE TAG -->/,/<!-- PAGE TAG -->/d'

The format of the regex is [2addr]d , where the 2 addresses are /<!-- PAGE TAG -->/ and /<!-- PAGE TAG -->/ which are separated by comma. 正则表达式的格式是[2addr]d ,其中2个地址是/<!-- PAGE TAG -->//<!-- PAGE TAG -->/ ,用逗号分隔。 d means delete all lines staring from the line that matches the first address to the line that matches the last address inclusive. d表示删除从匹配第一个地址的行到与最后一个地址匹配的行的所有行。 (It means things outside the tag, but on the same line as the tag will also be deleted). (它表示标记之外的内容,但与标记位于同一行也将被删除)。


Although Tim Pote has answered the question, I will just post this here just in case someone need to replace a multiline pattern: 虽然蒂姆波特已经回答了这个问题,但我会在这里发布,以防有​​人需要更换多线模式:

sed -n '1h; 1!H; ${g; s/<!-- PAGE TAG -->[^!]*<!-- PAGE TAG -->//g; p;}'

I modified the solution from an existing source, so most of the command is explained here . 我从现有的源代码修改了解决方案,因此这里解释大部分命令。

The regex here is a bit patchy, since it assumes there is no ! 这里的正则表达式有点不完整,因为它假设没有! character in the data between the 2 page tags. 2页标签之间的数据中的字符。 Without this assumption, I cannot control the number of characters matched by the regex, since there is no lazy quantifier (as far as I know). 没有这个假设,我无法控制正则表达式匹配的字符数,因为没有惰性量词(据我所知)。

This solution will not remove text before the tag even if it is on the same line as the tag. 此解决方案不会删除标记之前的文本,即使它与标记位于同一行。

While @nhahtdh's answer is the correct one for your original question, this solution is the answer to your comments: 虽然@nhahtdh的回答是原始问题的正确答案,但这个解决方案是您的意见的答案:

sed '
  /<!-- PAGE TAG -->/,/<!-- PAGE TAG -->/ {
    1 {
      s/^.*$/Replace Data/
      b
    }
    d
  }
'

You can read it like so: 您可以这样阅读:

/<!-- PAGE TAG -->/,/<!-- PAGE TAG -->/ -> for the lines between these regexes /<!-- PAGE TAG -->/,/<!-- PAGE TAG -->/ - >这些正则表达式之间的行

1 { -> for the first matching line 1 { - >表示第一个匹配的行

s/^.*$/Replace Data/ -> search for anything and replace with Replace Data s/^.*$/Replace Data/ - >搜索任何内容并替换为Replace Data

b -> branch to end (behaves like break in this instance) b - > branch to end(在这个例子中表现得像break)

d -> otherwise, delete the line d - >否则,删除该行

You can make any series of sed commands into one-liners with gnu sed by adding semicolons after each command (but it's not recommended if you want to be able to read it later on): 您可以通过在每个命令后添加分号将任何系列的sed命令组合成单行使用gnu sed(但如果您希望以后能够读取它,则不建议使用分号):

sed '/<!-- PAGE TAG -->/,/<!-- PAGE TAG -->/ { 1 { s/^.*$/Replace Data/; b; }; d; };'

Just as a side note, you should really try to be as specific as possible in your posting. 作为旁注,您应该尽量在发布中尽可能具体。 "replaced/removed" means "replaced OR removed". “替换/删除”表示“替换或删除”。 If you want it replaced, just say replaced. 如果你想要它被替换,只需说替换。 That helps both those of us trying to answer your question and future users who might be experiencing the same issue. 这有助于我们这些尝试回答您问题的人以及可能遇到同样问题的未来用户。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM