简体   繁体   English

连续删除 <br> 来自字符串使用正则表达式c#

[英]Remove consecutive <br> from string using regex c#

I have following string regex 我有以下字符串正则表达式

"choose to still go on the trip. <br><br>\r\nNote that when booking"

After converting it with regex I need to replace <br> tags with only one <br> so string would be like this 用正则表达式转换后我需要用一个<br>替换<br>标签,所以字符串就是这样的

"choose to still go on the trip. <br>Note that when booking"

This can be done in another (safer) way, using HTML Agility Pack (open source project http://html-agility-pack.net ). 这可以使用HTML Agility Pack (开源项目http://html-agility-pack.net )以另一种(更安全)的方式完成。

It takes into account the various notations <br> , <br/> , <br /> without you having to worry about it . 它考虑到各种符号<br><br/><br /> 没有你不必担心它 This means you can focus on the actual task: replacing duplicates. 这意味着您可以专注于实际任务:替换重复项。

See Remove chain of duplicate elements with HTML Agility Pack , it explains an approach on how to replace duplicates. 请参阅使用HTML Agility Pack删除重复元素链 ,它解释了如何替换重复项的方法。

If you need to account for the case where there is whitespace between the tags, try the following regex: 如果您需要考虑标记之间有空格的情况,请尝试以下正则表达式:

myInputStr = Regex.Replace(myInputStr,
    @"([\b\s]*<[\b\s]*[bB][rR][\s]*/?[\b\s]*>){2,}",
    "<br>", RegexOptions.Multiline);

This regex will replace 2 or more instances of <br> tags with a single instance, regardless of the formation of the tag (spacing, casing, self-closing etc.). 无论标签的形成如何(间距,套管,自闭合等),此正则表达式都将用单个实例替换2个或更多个<br>标签实例。

EDIT: If you don't know how many <br> you have, you can do this: 编辑:如果你不知道有多少<br>你有,你可以这样做:

  1. Split your string with <br> and remove empty entries. <br>拆分字符串并删除空条目。
  2. Join the string with single <br> 用single <br>加入字符串

Here is the code: 这是代码:

string yourString = "choose to still go on the trip. <br><br>\r\nNote that when booking";

var temp = 
    yourString.Split(new string[] { "<br>" }, StringSplitOptions.RemoveEmptyEntries)
               .Where(i => i.Replace(" ", string.Empty).Length > 0);

string result = string.Join("<br>", temp);

like Martin Eden susposed: 像马丁伊登一样:

while (text.Contains("<br><br>")) 
{ 
    text = text.Replace("<br><br>", "<br>"); 
}    

or 要么

string newString = oldString.Replace("<br><br><br>", "<br>");
newString = newString.Replace("<br><br>", "<br>");

do multiple such lines with increasing <br> 通过增加<br>来做多个这样的行

Regex.Replace(input, @"(<br\s*/{0,1}>\s*(</\s*br>)*){2,}", "<br>", 
    RegexOptions.CultureInvariant | 
    RegexOptions.IgnoreCase |
    RegexOptions.Multiline);

Replaces any two or more occurences of <br> or <br/> or <br></br> with a single <br> . 用一个<br>替换任何两个或更多个<br><br/>或者<br></br>的出现。

This takes whitespaces into account. 这需要考虑空白。 <br > would match aswell as <br /> or <br > </ br> . <br > <br > </ br> <br /> <br > </ br>

If you remove the unwanted "\\r\\n" beforehand you can omit RegexOptions.Multiline . 如果您事先删除了不需要的“\\ r \\ n”,则可以省略RegexOptions.Multiline

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM