简体   繁体   English

非贪婪的正则表达式无法正常工作

[英]Non greedy regex is not working as expected

I need to take certain part from the string with regex non greedy approach. 我需要使用正则表达式非贪婪方法从字符串中获取某些内容。 I am manipulating following string : 我正在操纵以下字符串:

<a href="/guidance/">Hi</a> </li><li  > <a href="/news/institutional/2012/05/000001asdf">Thanks</a>

from which I need to get : 从中我需要得到:

<a href="/news/institutional/2012/05/000001asdf">Thanks</a>

I've been trying with following regex : 我一直在尝试以下正则表达式:

<a.*?news/.*?/(\d{1,4}\/[01]?\d)?.*?</a>

but it gets all string instead of part of string mentioned above.As far as I understand .*? 但是它得到了所有字符串而不是上面提到的一部分字符串。据我了解.*? capture shortest match but it's not working as expected. 捕获最短的匹配,但未按预期运行。

This [^>] is a negative character class, any character except angle [^>]是否定字符类, angle 以外的任何字符
brace. 支撑。 This stops a non-greedy .*? 这会停止非贪婪的.*? from matching the end of the tag 从匹配标签的结尾
(turning it semi-greedy) when it can't find the specific news anchor. (将其设为半贪婪状态),当它找不到特定的news主播时。

 #  @"(?s)<a[^>]*?news/[^>/]*?/(\d{1,4}(?:/\d+)*)?[^>]*?>.*?</a>"

 (?s)                  # Modifier, Dot-Matches any character
 <a                    # Open 'a' tag
 [^>]*?                # Any non '>' character
 news/                 # Need 'news/'
 [^>/]*?               # Any non '>' or '/' character
 /                     # Need '/'
 (                     # (1 start), Optional Date ?
      \d{1,4}               # 1-4 digit year
      (?: / \d+ )*          # month / day, etc ..
 )?                    # (1 end)
 [^>]*?                # Any non '>' character
 >                     # End Open '>' tag
 .*?                   # Anything
 </a>                  # Close 'a' tag 

C# example: C#示例:

string news = @"
<a href=""/guidance/"">Hi</a> </li><li  > <a href=""/news/institutional/2012/05/000001asdf"">Thanks</a>
<a href=""/rintime/"">Hi</a> <a href=""/news/google/asdf"">GOOGLE</a>
";
Regex RxNews = new Regex(@"(?s)<a[^>]*?news/[^>/]*?/(\d{1,4}(?:/\d+)*)?[^>]*?>.*?</a>" );
Match _mNews = RxNews.Match( news );
while (_mNews.Success)
{
    Console.WriteLine("Found: {0}\r\nGroup 1 = {1}\r\n", _mNews.Groups[0].Value, _mNews.Groups[1].Value);
    _mNews = _mNews.NextMatch();
}

Output: 输出:

Found: <a href="/news/institutional/2012/05/000001asdf">Thanks</a>
Group 1 = 2012/05/000001

Found: <a href="/news/google/asdf">GOOGLE</a>
Group 1 =

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM