C＃错误无法识别的分组构造

Question

需要帮助..为什么我会收到一个ArgumentException是Unhandle。 错误显示Unrecognized grouping construct 。 我的图案错了吗？

   WebClient client = new WebClient();
            string contents = client.DownloadString("http://site.com");

                string pattern =@"<td>\s*(?<no>\d+)\.\s*</td>\s*<td>\s*
                        <a class=""LN"" href=""[^""]*+"" 
                        onclick=""[^""]*+"">\s*+<b>(?<name>[^<]*+)
                        </b>\s*+</a>.*\s*</td>\s*+ 
                        <td align=""center"">[^<]*+</td>
                        \s*+<td>\s*+(?<locations>(?:<a href=""[^""]*+"">[^<]*+</a><br />\s*+)++)</td>";

            foreach (Match match in Regex.Matches(contents, pattern, RegexOptions.IgnoreCase))
            {
                string no = match.Groups["no"].Value;
                string name = match.Groups["name"].Value;
                string locations = match.Groups["locations"].Value;

                Console.WriteLine(no+" "+name+" "+locations);
            }

Answer 1

C＃/。NET中没有诸如?P<name>这样的东西。 等效语法只是?<name> 。

P命名的组语法来自PCRE / Python（Perl允许它作为扩展名）。

您还需要删除所有嵌套的量词（即，将*+更改为* ，将++更改为+ ）。 如果要获得完全相同的行为，可以将X*+切换为(?>X*) ，同样也可以使用++ 。

这是您的正则表达式，已修改。 我也曾尝试对此发表评论，但是我不能保证我这样做不会破坏它。

new Regex(
@"<td>                   # a td element
    \s*(?<no>\d+)\.\s*   # containing a number captured as 'no'
  </td>\s*
  <td>\s*                # followed by another td, containing
                         # an <a href=... onclick=...> exactly
      <a class=""LN"" href=""(?>[^""]*)"" onclick=""(?>[^""]*)""> 
         (?>\s*)                   # which contains
         <b>(?<name>(?>[^<]*))</b> # some text in bold captured as 'name'
         (?>\s*)
      </a>
      .*                 # and anywhere later in the document
      \s*
  </td>                  # the end of a td, followed by whitespace
  (?>\s*)   
  <td align=""center"">  # after a <td align=center> containing no other elements
    (?>[^<]*)
  </td>
  (?>\s*)
  <td>                   # lastly 
    (?>\s*)
    (?<locations>        # a series of <a href=...>...</a><br/>
        (?>(?:           # captured as 'locations'
            <a href=""(?>[^""]*)"">(?>[^<]*)</a>
            <br />
            (?>\s*)
            )
        +))              # (containing at least one of these)
  </td>", RegexOptions.IgnorePatternWhitespace|RegexOptions.IgnoreCase)

但是您确实应该使用类似HTML Agility Pack的东西。

C＃错误无法识别的分组构造

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-10-25 03:34:42

C＃错误无法识别的分组构造

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-10-25 03:34:42

解决方案1
1 已采纳 2013-10-25 03:34:42