简体   繁体   中英

Sed regexp multiline - replace HTML

I am attempting to replace multiple lines using sed on a Linux system

Here is my file

<!-- PAGE TAG -->
DATA1
DATA2
DATA3
DATA4
DATA5
DATA6
<div id="DATA"></div>
DATA8
DATA9
<!-- PAGE TAG -->

The attempts I have made and failed!

sed -n '1h;1!H;${;g;s/<!-- PAGE TAG -->.*<!-- PAGE TAG -->//g;p;}' 
sed -n '1!N; s/<!-- PAGE TAG -->.*<!-- PAGE TAG -->// p'
sed -i 's|<!--[^>]*-->[^+]+<!--[^>]*-->||g' 
sed -i 's|/\/\/<!-- PAGE TA -->/,/\/\/<!-- PAGE TA -->||g'

Everything in between <!-- PAGE TAG --> should be replaced.

This question is similar sed multiline replace

Adapting from the answer given in the link you see, this should work:

sed '/<!-- PAGE TAG -->/,/<!-- PAGE TAG -->/d'

The format of the regex is [2addr]d , where the 2 addresses are /<!-- PAGE TAG -->/ and /<!-- PAGE TAG -->/ which are separated by comma. d means delete all lines staring from the line that matches the first address to the line that matches the last address inclusive. (It means things outside the tag, but on the same line as the tag will also be deleted).


Although Tim Pote has answered the question, I will just post this here just in case someone need to replace a multiline pattern:

sed -n '1h; 1!H; ${g; s/<!-- PAGE TAG -->[^!]*<!-- PAGE TAG -->//g; p;}'

I modified the solution from an existing source, so most of the command is explained here .

The regex here is a bit patchy, since it assumes there is no ! character in the data between the 2 page tags. Without this assumption, I cannot control the number of characters matched by the regex, since there is no lazy quantifier (as far as I know).

This solution will not remove text before the tag even if it is on the same line as the tag.

While @nhahtdh's answer is the correct one for your original question, this solution is the answer to your comments:

sed '
  /<!-- PAGE TAG -->/,/<!-- PAGE TAG -->/ {
    1 {
      s/^.*$/Replace Data/
      b
    }
    d
  }
'

You can read it like so:

/<!-- PAGE TAG -->/,/<!-- PAGE TAG -->/ -> for the lines between these regexes

1 { -> for the first matching line

s/^.*$/Replace Data/ -> search for anything and replace with Replace Data

b -> branch to end (behaves like break in this instance)

d -> otherwise, delete the line

You can make any series of sed commands into one-liners with gnu sed by adding semicolons after each command (but it's not recommended if you want to be able to read it later on):

sed '/<!-- PAGE TAG -->/,/<!-- PAGE TAG -->/ { 1 { s/^.*$/Replace Data/; b; }; d; };'

Just as a side note, you should really try to be as specific as possible in your posting. "replaced/removed" means "replaced OR removed". If you want it replaced, just say replaced. That helps both those of us trying to answer your question and future users who might be experiencing the same issue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM