简体   繁体   中英

How to extract the web address from html string using Regular Expression

I need to extract the web address from this string:

<p> Feb 24 - <a href="http://austin.daylife.org/apa/2867907745.html">$390 / 2br - 600ft&sup2; - Sleeps 4-Walk to SXSW-SOCO-Perfect Location</a> - <font size="-1"> (South 5th)</font> <span class="p"> pic</span></p>

How can I achieve the same using regular expression in C#?

Use this regular expression:

http(s)?://([\w+?\.\w+])+([a-zA-Z0-9\~\!\@\#\$\%\^\&amp;\*\(\)_\-\=\+\\\/\?\.\:\;\'\,]*)?

EDIT: Simpler expression:

http(s)?://([\w-]+.)+[\w-]+(/[\w- ./?%&=])?

This works for me:

        string source = " <p> Feb 24 - <a href=\"http://austin.daylife.org/apa/2867907745.html\">$390 / 2br - 600ft&sup2; - Sleeps 4-Walk to SXSW-SOCO-Perfect Location</a> - <font size=\"-1\"> (South 5th)</font> <span class=\"p\"> pic</span></p> ";
        Regex regex = new Regex("<a[^>]*? href=\"(?<url>[^\"]+)\"[^>]*?>(?<text>.*?)</a>");
        var m = regex.Match(source);
        string url = m.Groups["url"];

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM