简体   繁体   English

如何用正则表达式匹配这些字符串?

[英]How to match these strings with Regex?

<div> 

            <a href="http://website/forum/f80/ThreadLink-new/" id="thread_gotonew_565407"><img class="inlineimg" src="http://website/forum/images/buttons/firstnew.gif" alt="Go to first new post" border="0" /></a> 



            [MULTI]
            <a href="http://website/forum/f80/ThreadLink/" id="thread_title_565407" style="font-weight:bold">THREAD TITLE</a> 

        </div> 

I know for a fact that the link I am interested in is gonna be bold: 我知道我感兴趣的链接将变为粗体:

font-weight:bold

But the link itself comes before. 但是链接本身位于前面。 How would I be able to match both the link address: 如何匹配两个链接地址:

http://website/forum/f80/ThreadLink/

and the thread title: 和线程标题:

THREAD TITLE

EDIT: Internet Explorer HTML code is very different: 编辑:Internet Explorer HTML代码是非常不同的:

  <A style="FONT-WEIGHT: bold" id=thread_title_565714 
      href="http://LinkAddress-565714/">ThreadTitle</A> </DIV>
.*<a href="(.*?)".*style="font-weight:bold">(.*?)</a>

Match group 1: Url Match group 2: Thread Title 比赛组1:网址比赛组2:线程标题

This will match any bold link. 这将匹配任何粗体链接。 If you want to match a particular one, replace the (.*?) with those values. 如果要匹配特定值,请用这些值替换(。*?)。

Try this: 尝试这个:

ThreadTitle 线程标题

<A style="FONT-WEIGHT: bold" id=(?<id>.*?)[\s\S]*? href="(?<url>.*?)">(?<title>.*?)</A>

So you can use: 因此,您可以使用:

Regex link = new Regex(@"<A style=""FONT-WEIGHT: bold"" id=(?<id>.*?)[\s\S]*? href=""(?<url>.*?)"">(?<title>.*?)</A>");
foreach (Match match in link.Matches(input))
{
    Console.WriteLine(
        "Id={0}, Url={1}, Title={2}",
        match.Groups["id"].Value,
        match.Groups["url"].Value,
        match.Groups["title"].Value);
}
<a href="([^"]*)"[^>]*style="[^"]*font-weight:bold[^"]*"[^>]*>([^<]*)</a>

Much the same as the previous answers, except I've replaced their .* with [^"]* etc. In the first match, this prevents it from matching anything outside the next double-quote symbol. Without doing this, if you could match too much in cases where the input looked like this: 与前一个答案基本相同,除了我用[^"]*等替换了它们的.* 。在第一个匹配项中,这防止它与下一个双引号符号之外的任何内容匹配。如果不这样做,则可以在输入看起来像这样的情况下,匹配得太多了:

<a href="#dont_match_me">Don't match me</a><br/>
<a href="http://website/forum/f80/ThreadLink/ style="font-weight:bold">THREAD TITLE</a>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM