简体   繁体   English

使用 C# 正则表达式删除包含在 div 标签中的文本

[英]Remove text enclosed in a div tag using C# Regex

I have a string as follows: string chart = "<div id=\\"divOne\\">Label.</div>;"我有一个字符串如下: string chart = "<div id=\\"divOne\\">Label.</div>;" which is generated dynamically without my control and would like to remove the text "Label."这是在没有我的控制的情况下动态生成的,并希望删除文本“标签”。 from the enclosing div element.来自封闭的 div 元素。

I tried the following but my regex knowledge still limited to get it working: System.Text.RegularExpressions.Regex.Replace(chart, @"/(<div[^>]+>)[^<]+(<\\/div>)/i", "");我尝试了以下操作,但我的正则表达式知识仍然仅限于使其正常工作: System.Text.RegularExpressions.Regex.Replace(chart, @"/(<div[^>]+>)[^<]+(<\\/div>)/i", "");

Using LinqPad I got this snippet working.使用 LinqPad 我得到了这个片段。 Hopefully it solves your problem correctly.希望它能正确解决您的问题。

string chart = "<div id=\"divOne\">Label.</div>;";

var regex = new System.Text.RegularExpressions.Regex(@">.*<");

var result = regex.Replace(chart, "><");

result.Dump(); // prints <div id="divOne"></div>

Essentially, it finds all characters between the opposing angle brackets, and replaces it.本质上,它查找相对尖括号之间的所有字符,并替换它。

The approach you take depends on how robust the replacement needs to be.您采用的方法取决于替换需要的稳健程度。 If you're using this at a more general level where you want to target the specific node, you should use a MatchEvaluator.如果您在更一般的级别使用它来定位特定节点,则应该使用 MatchEvaluator。 This example produces a similar result:这个例子产生了类似的结果:

string pattern = @"<(?<element>\w*) (?<attrs>.*)>(?<contents>.*)</(?<elementClose>.*>)";

var x = System.Text.RegularExpressions
    .Regex.Replace(chart, pattern, m => m.Value.Replace(m.Groups["contents"].Value, ""));

The pattern you use in this case is customizable, but it takes advantage of named group captures.您在这种情况下使用的模式是可自定义的,但它利用了命名组捕获。 It allows you to isolate portions of the match, and refer to them by name.它允许您隔离匹配的部分,并按名称引用它们。

Try this for your regex:试试这个为你的正则表达式:

<div\b[^>]*>(.*?)<\/div>

The following produces the output <div></div>以下产生输出<div></div>

System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(@"<div\b[^>]*>(.*?)<\/div>");
Console.WriteLine(regex.Replace("<div>Label 1.</div>","<div></div>"));
Console.ReadLine();

Your regex looks good to me, (but don't specify the '/.../i' delimiters and modifier).您的正则表达式对我来说看起来不错,(但不要指定'/.../i'分隔符和修饰符)。 And use '$1$2' as your replacement string:并使用'$1$2'作为替换字符串:

var re = new System.Text.RegularExpressions.Regex(@"(?i)(<div[^>]+>)[^<]+(<\/div>)");
var text = regex.Replace(text, "$1$2");

您必须只编写一个模式来选择 div 标签中的文本。

Regex.Replace(chart,yourPattern,string.empty);

I'm a little confused by your question;我对你的问题有点困惑; it sounds like you are parsing through some pre-generated HTML and want to remove all instances of the value of chart that occur within in a <div> tag.听起来您正在解析一些预先生成的 HTML,并希望删除<div>标记中出现的chart值的所有实例。 If that's correct, try this:如果这是正确的,请尝试以下操作:

"(<div[^>]*>[^<]*)"+chart+"([^<]*</div>)"

Return the first & second groupings concatenated together and you should have your <div> back sans chart .返回连接在一起的第一个和第二个分组,您应该将<div>返回 sans chart

Here is a better way than Regex.这是比 Regex 更好的方法。

var element = XElement.Parse("<div id=\"divOne\">Label.</div>");
element.Value = "";
var value = element.ToString();

RegEx match open tags except XHTML self-contained tags RegEx 匹配除 XHTML 自包含标签之外的开放标签

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM