简体   繁体   English

正则表达式以获取href中的链接。 [asp.net]

[英]Regex to get the link in href. [asp.net]

hi i got problems to get my regex to work. 嗨,我遇到了使我的正则表达式工作的问题。 im working with C# asp.net i will post the code i use now and what i cant get to work is the second regex to get whatever is in the href="LINK" 我正在使用C#asp.net,我将发布我现在使用的代码,而我无法使用的是第二个正则表达式,以获取href =“ LINK”中的内容

thx in advance 提前

var textBody = "lorem ipsum... <a href='http://www.link.com'>link</a>";


        var urlTagPattern = new Regex(@"<a.*?href=[""'](?<url>.*?)[""'].*?>(?<name>.*?)</a>", RegexOptions.IgnoreCase);



        //THIS IS THE REGEX
        var hrefPattern = new Regex(@"HREF={:q}\>", RegexOptions.IgnoreCase);




        var urls = urlTagPattern.Matches(textBody);


        foreach (Match url in urls)
        {

            var hrefs = hrefPattern.Match(url.ToString());


            litStatus.Text = hrefs.ToString();
        }

Welcome to your daily installment of Don't Use Regex To Parse HTML . 欢迎使用不要使用正则表达式解析HTML”的日常文章。 In this edition of Don't Use Regex To Parse HTML , we'll be reminding you not to use regex to parse HTML because HTML cannot reliably be parsed by a regex and dozens of valid HTML constructs will break the naïve regex proposed. 在此版本的“ 不要使用正则表达式来解析HTML”中 ,我们将提醒您不要使用正则表达式来解析HTML,因为正则表达式不能可靠地解析HTML,并且数十种有效的HTML构造都会破坏建议的纯正则表达式。 We won't be mentioning all the additional invalid ones in common use on the web in Don't Use Regex To Parse HTML today. 今天,在“ 不要使用正则表达式解析HTML”中,我们不会提及网络上所有其他常用的无效字符

Also in Don't Use Regex To Parse HTML , we'll be linking to the Html Agility Pack , a .NET library you can use to parse HTML properly and subsequently extract link URLs reliably in just a couple of lines of code (a very similar example being present on that page). 同样在“ 不要使用正则表达式解析HTML”中 ,我们将链接到Html Agility Pack ,这是一个.NET库,您可以使用它正确解析HTML,随后只需几行代码即可可靠地提取链接URL(该页面上存在类似的示例)。

We hope you have enjoyed today's Don't Use Regex To Parse HTML , and look forward to seeing you again tomorrow for another exciting edition of Don't Use Regex To Parse HTML , when someone posts another question about using regex to parse HTML. 我们希望您喜欢今天的“ 不要使用正则表达式解析HTML” ,并期待明天再见到您另一个激动人心的“ 不要使用正则表达式解析HTML”版本 ,当有人发布有关使用正则表达式解析HTML的另一个问题时。 But that's all from Don't Use Regex To Parse HTML for now. 但这就是暂时不使用正则表达式解析HTML的全部内容。 Bye! 再见!

The following example searches an input string and prints out all the href="…" values and their locations in the string. 以下示例搜索输入字符串,并输出所有href =“…”值及其在字符串中的位置。 It does this by constructing a compiled Regex object and then using a Match object to iterate through all the matches in the string. 它通过构造一个已编译的Regex对象,然后使用Match对象来遍历字符串中的所有匹配项来做到这一点。 In this example, the metacharacter \\s matches any space character, and \\S matches any nonspace character. 在此示例中,元字符\\ s匹配任何空格字符,而\\ S匹配任何非空格字符。

' VB VB

Sub DumpHrefs(inputString As String) Sub DumpHrefs(inputString as String)

Dim r As Regex
Dim m As Match

r = New Regex("href\s*=\s*(?:""(?<1>[^""]*)""|(?<1>\S+))", _
    RegexOptions.IgnoreCase Or RegexOptions.Compiled)

m = r.Match(inputString)
While m.Success
    Console.WriteLine("Found href " & m.Groups(1).Value _
        & " at " & m.Groups(1).Index.ToString())
    m = m.NextMatch()
End While

End Sub 结束子

// C# // C#

void DumpHrefs(String inputString) { void DumpHrefs(String inputString){

Regex r;
Match m;

r = new Regex("href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))",
    RegexOptions.IgnoreCase|RegexOptions.Compiled);
for (m = r.Match(inputString); m.Success; m = m.NextMatch())
{
    Console.WriteLine("Found href " + m.Groups[1] + " at "
        + m.Groups[1].Index);
}

} }

第二个正则表达式应为:

href=['"](?<link>[^'"]*)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM