简体   繁体   English

Javascript 正则表达式替换两个标签之间的多行内容(包括标签)

[英]Javascript regexp replace of multiline content between two tags (including the tags)

In the string在字符串中

some text <p id='item_1' class='item'>multiline content\r\n\r\n for <br/>remove</p><br clear='all' id='end_of_item_1'/><p id='item_2' class='item'>another multiline content\r\n\r\n</p><br clear='all' id='end_of_item_2'/>

I need to remove我需要删除

<p id='item_1' class='item'>multiline content\r\n\r\n for <br/>remove</p><br clear='all' id='end_of_item_1'/>

Can't find a way how to do it.找不到方法如何做到这一点。

var id = 'item_1';
var patt=new RegExp("<p id='"+id+"'(.)*|([\S\s]*?)end_of_"+id+"'\/>","g");
var str="some text <p id='item_1' class='item'>multiline content\r\n\r\n for <br/>remove</p><br clear='all' id='end_of_item_1'/><p id='item_2' class='item'>another multiline content\r\n\r\n</p><br clear='all' id='end_of_item_2'/>";
document.write(str.replace(patt,""));

The result is结果是

some text for
<br>
remove
<p></p>
<br id="<p id=" class="item" clear="all" item_2'="">
another multiline content
<p></p>
<br id="end_of_item_2" clear="all">

Please help to solve this.请帮助解决这个问题。

Why can't you use the DOM API to remove it?为什么不能使用 DOM API 来删除它? (add everything to the document, and then remove what you don't need) (将所有内容添加到文档中,然后删除不需要的内容)

var item1 = document.getElementById('item_1'),
    endOfItem1 = document.getElementById('end_of_item_1');

item1.parentNode.removeChild(item1);
endOfItem1.parentNode.removeChild(endOfItem1);

I need to assume a bit of unspoken constraints from your question, to get this to work:我需要从你的问题中假设一些不言而喻的限制,才能让它发挥作用:

Am I right in guessing, that you want a regex, that can find (and then replace) any 'p' tag with a specific id, up to a certain tag (like eg a 'br' tag) with an id of 'end_of_[firstid]'?我猜对了吗,你想要一个正则表达式,它可以找到(然后替换)任何具有特定 id 的“p”标签,直到某个标签(如“br”标签),id 为“end_of_” [第一个]'?

If that is correct, than the following regex might work for you.如果这是正确的,那么以下正则表达式可能对您有用。 It may be, that you need to modify it a bit, to get JS to accept it:可能需要稍微修改一下,让 JS 接受它:

<p\s+id='([a-zA-Z0-9_]+)'.*?id='end_of_\1'\s*\/>

This will give you any constellation with the criteria, describled above, and the name if the id as group 1, It should now be a simple task, to check if group1 contains the id you want to remove and then replace the whole match with an empty string.这将为您提供具有上述标准的任何星座,以及如果 id 为组 1 的名称,现在应该是一项简单的任务,检查 group1 是否包含您要删除的 id,然后将整个匹配替换为空字符串。

If I understand your example correcty (I am not that good with JavaScript and my RegEx was based rather on the general perl-regex fashion) you could maybe do something like the following:如果我理解您的示例正确性(我对 JavaScript 不太擅长,而且我的 RegEx 是基于一般的 perl-regex 时尚),您可能会执行以下操作:

var patt=new RegExp("<p\s+id='"+id+"'.*?id='end_of_"+id+"'\s*\/>","g");

That way, you don't have to worry about group matching, although I find it to be more elegant, to match the id you wanted via a group instead of inserting it into the RegEx.这样,您不必担心组匹配,尽管我发现它更优雅,通过组匹配您想要的 id 而不是将其插入到 RegEx 中。

Here's the regex for the current scenario.这是当前场景的正则表达式。 When the regex approach eventually breaks, remember that we warned that parsing HTML with regex was a fool's errand.当正则表达式方法最终失效时,请记住我们警告过使用正则表达式解析 HTML 是愚蠢的差事。 ;) ;)

This:这个:

var s        = "some text <p id='item_1' class='item'>multiline content\r\n\r\n for <br/>remove</p><br clear='all' id='end_of_item_1'/><p id='item_2' class='item'>another multiline content\r\n\r\n</p><br clear='all' id='end_of_item_2'/><ul><li>";
var id       = 'item_1';

var patt     = new RegExp ("<p[^<>]*\\sid=['\"]" + id + "['\"](?:.|\\n|\\r)*<br[^<>]*\\sid=['\"]end_of_" + id + "['\"][^<>]*>", "ig")

var stripped = s.replace (patt, "");

Produces this:产生这个:

"some text <p id='item_2' class='item'>another multiline content 

</p><br clear='all' id='end_of_item_2'/><ul><li>"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM