从HTML字符串中删除字符串的父标签，直到其结束标签

Question

i have a very long string, what i am, trying to accomplish is delete a section in that string from the <tr> parent tag of the string to his closing tag </tr> . 我有一个很长的字符串，我要做的是从该字符串的<tr>父标签到其结束标签</tr>删除该字符串中的一部分。 (hope i am clear enough). （希望我很清楚）。

So when i call The method RemoveSection with the text "Search Integration" 因此，当我使用文本“ Search Integration”调用方法RemoveSection时

Html before 之前的HTML

  <tr>
    <td class=\"SectionHeaderHolder\" colspan=\"4\">
    <p class=\"SectionHeader\">Header XX<span class=\"help\">Help</span></p>
    </td>
    </tr>

    <tr>
    <td class=\"SectionHeaderHolder\" colspan=\"4\">
    <p class=\"SectionHeader\">Search Integration<span class=\"help\">Help</span></p>
    </td>
    </tr>

    <tr>
    <td class=\"SectionHeaderHolder\" colspan=\"4\">
    <p class=\"SectionHeader\">Header YY<span class=\"help\">Help</span></p>
    </td>
    </tr>

The string that Remove function gets will be under <p class=\\"SectionHeader\\"> Remove函数获取的字符串将在<p class=\\"SectionHeader\\">
There will be only one section with that string- so the first occurrence should be handled by the remove function. 该字符串将只有一个部分，因此第一次出现应由remove函数处理。

Html after 之后的HTML

 <tr>
    <td class=\"SectionHeaderHolder\" colspan=\"4\">
    <p class=\"SectionHeader\">Header XX<span class=\"help\">Help</span></p>
    </td>
    </tr>

    <tr>
    <td class=\"SectionHeaderHolder\" colspan=\"4\">
    <p class=\"SectionHeader\">Header YY<span class=\"help\">Help</span></p>
    </td>
    </tr>

Answer 1

You could use HtmlAgilityPack for this. 您可以为此使用HtmlAgilityPack 。 A simple LinqPad example: 一个简单的LinqPad示例：

void Main()
{
    string input = "<tr><td class=\"SectionHeaderHolder\" colspan=\"4\"><p class=\"SectionHeader\">Header XX<span class=\"help\">Help</span></p></td></tr>"
                + "<tr><td class=\"SectionHeaderHolder\" colspan=\"4\">    <p class=\"SectionHeader\">Search Integration<span class=\"help\">Help</span></p>    </td>    </tr>"
                + "<tr><td class=\"SectionHeaderHolder\" colspan=\"4\">    <p class=\"SectionHeader\">Header YY<span class=\"help\">Help</span></p>    </td>    </tr>";

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(input);

    doc.DocumentNode.SelectSingleNode("//p[text()='Search Integration']").ParentNode.ParentNode.Remove();

    string output = doc.DocumentNode.OuterHtml;

    input.Dump();
    output.Dump();
}

Answer 2

While I'd still recommend the accepted solution, same thing can be done using plain regex 虽然我仍然会推荐可接受的解决方案，但是可以使用纯正则表达式来完成同样的事情

string search = "Search Integration";

string pattern = "<tr(?:(?!/?<tr).)*" + search + "(?:(?!/?tr).)*/tr>";
Regex r = new Regex(pattern, RegexOptions.Singleline);
string result = r.Replace(text, "");

Demo: https://dotnetfiddle.net/OcV6E5 演示： https : //dotnetfiddle.net/OcV6E5

从HTML字符串中删除字符串的父标签，直到其结束标签

问题描述

Html before 之前的HTML

Html after 之后的HTML

2 个解决方案

解决方案1
1 已采纳 2016-09-07 08:58:40

解决方案2
1 2016-09-07 14:20:00

从HTML字符串中删除字符串的父标签，直到其结束标签

问题描述

Html before 之前的HTML

Html after 之后的HTML

2 个解决方案

解决方案1 1 已采纳 2016-09-07 08:58:40

解决方案2 1 2016-09-07 14:20:00

解决方案1
1 已采纳 2016-09-07 08:58:40

解决方案2
1 2016-09-07 14:20:00