刪除部分Regex.Match字符串

Question

所以我在一個字符串中有一個HTML表。 大多數HTML來自FrontPage，因此它的格式很糟糕。 這是一個快速的樣子。

<b>Table 1</b>
  <table class='class1'>
  <tr>
    <td>
      <p>Procedure Name</td>
    <td>
        <p>Procedure</td>
    </tr>
  </table>
<p><b>Table 2</b></p>
  <table class='class2'>
    <tr>
      <td>
        <p>Procedure Name</td>
        <td>
        <p>Procedure</td>
    </tr>
  </table>
<p> Some text is here</p>

根據我的理解，FrontPage會在每個新單元格中自動添加<p> 。

我想刪除這些<p>是表格內的標簽，但保留表外的人。 到目前為止我嘗試了兩種方法：

第一種方法

第一種方法是使用單個RegEx tp捕獲表中的每個<p>標記，然后使用Regex.Replace()來刪除它們。 但是我從來沒有設法為此獲得正確的RegEx。 （我知道使用RegEx解析HTML很糟糕。我認為數據很簡單，可以將RegEx應用到它）。

我可以使用這個正則表達式輕松地獲取每個表中的所有內容： <table.*?>(.*?)</table>

然后我只想抓取<p>標簽，所以我寫了這個： (?<=<table.*?>)(<p>)(?=</table>) 。 這與任何事情都不相符。 （顯然.NET允許量詞在他們的外觀中。至少那是我在使用http://regexhero.net/tester/時的印象）

我可以通過任何方式修改此RegEx以僅捕獲我需要的內容嗎？

第二種方法

第二種方法是僅將表內容捕獲到字符串中，然后使用String.Replace()來刪除<p>標記。 我正在使用以下代碼來捕獲匹配項：

MatchCollection tablematch = Regex.Matches(htmlSource, @"<table.*?>(.*?)</table>", RegexOptions.Singleline);

htmlSource是一個包含整個HTML頁面的字符串，該變量將在處理后發送回客戶端。 我想只刪除我需要從htmlSource刪除的htmlSource 。

如何使用MatchCollection刪除<p>標簽，然后將更新的表發送回htmlSource ？

謝謝

Answer 1

這個答案基於第二種建議的方法。 更改正則表達式以匹配表中的所有內容：

<table.*?table>

並使用Regex.Replace指定MatchEvaluator以表示所需的替換：

Regex myRegex = new Regex(@"<table.*?table>", RegexOptions.Singleline);
string replaced = myRegex.Replace(htmlSource, m=> m.Value.Replace("<p>",""));
Console.WriteLine(replaced);

使用問題輸入輸出：

<b>Table 1</b>
    <table class='class1'>
    <tr>
    <td>
        Procedure Name</td>
    <td>
        Procedure</td>
    </tr>
    </table>
<p><b>Table 2</b></p>
    <table class='class2'>
    <tr>
        <td>
        Procedure Name</td>
        <td>
        Procedure</td>
    </tr>
    </table>
<p> Some text is here</p>

Answer 2

我想通過使用委托（回調）它可以完成。

string html = @"
<b>Table 1</b>
  <table class='class1'>
  <tr>
    <td>
      <p>Procedure Name</td>
    <td>
        <p>Procedure</td>
    </tr>
  </table>
<p><b>Table 2</b></p>
  <table class='class2'>
    <tr>
      <td>
        <p>Procedure Name</td>
        <td>
        <p>Procedure</td>
    </tr>
  </table>
<p> Some text is here</p>
";

Regex RxTable = new Regex( @"(?s)(<table[^>]*>)(.+?)(</table\s*>)" );
Regex RxP = new Regex( @"<p>" );

string htmlNew = RxTable.Replace( 
    html,
    delegate(Match match)
    {
       return match.Groups[1].Value + RxP.Replace(match.Groups[2].Value, "") + match.Groups[3].Value;
    }
);
Console.WriteLine( htmlNew );

輸出：

<b>Table 1</b>
  <table class='class1'>
  <tr>
    <td>
      Procedure Name</td>
    <td>
        Procedure</td>
    </tr>
  </table>
<p><b>Table 2</b></p>
  <table class='class2'>
    <tr>
      <td>
        Procedure Name</td>
        <td>
        Procedure</td>
    </tr>
  </table>
<p> Some text is here</p>

Answer 3

通常正則表達式允許你使用嵌套結構，它非常難看，你應該避免它，但如果你沒有其他選項，你可以使用它。

static void Main()
{
    string s = 
@"A()
{
    for()
    {
    }
    do
    {
    }
}
B()
{
    for()
    {
    }   
}
C()
{
    for()
    {
        for()
        {
        }
    }   
}";

    var r = new Regex(@"  
                      {                       
                          (                 
                              [^{}]           # everything except braces { }   
                              |
                              (?<open>  { )   # if { then push
                              |
                              (?<-open> } )   # if } then pop
                          )+
                          (?(open)(?!))       # true if stack is empty
                      }                                                                  

                    ", RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture);

    int counter = 0;

    foreach (Match m in r.Matches(s))
        Console.WriteLine("Outer block #{0}\r\n{1}", ++counter, m.Value);

    Console.Read();
}

這里正則表達式“知道”塊的開始位置和結束位置，因此如果沒有合適的關閉標記，您可以使用此信息刪除<p>標記。

刪除部分Regex.Match字符串

問題描述

第一種方法

第二種方法

3 個解決方案

解決方案1
1 已采納 2015-06-08 17:59:24

解決方案2
1 2015-06-08 18:11:18

解決方案3
0 2015-06-08 16:00:22

刪除部分Regex.Match字符串

問題描述

第一種方法

第二種方法

3 個解決方案

解決方案1 1 已采納 2015-06-08 17:59:24

解決方案2 1 2015-06-08 18:11:18

解決方案3 0 2015-06-08 16:00:22

解決方案1
1 已采納 2015-06-08 17:59:24

解決方案2
1 2015-06-08 18:11:18

解決方案3
0 2015-06-08 16:00:22