简体   繁体   中英

Get the number of an href url parameter from downloaded html page?

I am trying to get an ID from a url parameter inside an href that looks like this:

<a href="http://www.mysite.com/myitem.php?id=71312">MyItemName</a>

I want the 71312 only and at the momment I am trying to do it using regex (but if you have a better approch I would be glad to try):

        string html,itemID;
        using (var client = new WebClient())
        {
            html = client.DownloadString("http://www.mysite.com/search.php?search_text=" + myItemName);
        }

        string pattern = "<a href=\"http://www.mysite.com/myitem.php?id=(\d+)\">" + myItemName + "</a>";
        Match m = Regex.Match(html, pattern, RegexOptions.IgnoreCase);
        if (m.Success)
        {
            itemID = m.Groups[1].Value;
            MessageBox.Show(itemID);
        }

Example of the html:

more html body
<h1>Items - List</h1>
<p><a href="http://www.mysite.com/myitem.php?id=12313">MyItemNameTest</a>, <a href="http://www.mysite.com/myitem.php?id=83">MyItemNameTestB</a>, <a href="http://www.mysite.com/myitem.php?id=213784">MYItemNameOther</a></p>

</div>
more html body

To show where your regex went wrong:

. and ? are special characters in regular expressions. . means "any character" and ? means "zero or one occurences of the previous expression". Therefore your regex fails to match. Also, you need to use verbatim strings in C# (unless you want to escape every backslash):

@"<a href=\"http://www\.mysite\.com/myitem\.php\?id=(\d+)\">" + myItemName + "</a>";

will probably work.

That said, unless all the links you're examining follow exactly this format, you might run into problems. It's kind of a running gag here on SO that parsing HTML with regular expressions will earn you the wrath of Cthulhu.

Use:

Uri u = new Uri("http://www.mysite.com/myitem.php?id=12313");
string s = u.Query;
HttpUtility.ParseQueryString(s).Get("id");

In variable id you have the number. Figure out the rest of the function :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM