简体   繁体   中英

Windows phone Web scraping

I'm trying to scrape data from a webpage. By using HtmlAgility pack I can load a particular div that I want to display. But inside this div node there are other sub/child node. How can I extract the innerhtml of each subnode? Here's what I've done:

var webget = new HtmlWeb();
var doc = webget.Load("http://www.dmp.gov.bd/application/index/pressdetails/press_159");

HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='span8 inner_mess']");

Here I'm pointing a specific webpage. It won't be the same all time, but it's confirm that the div is same and inside that div there will be different sub nodes depending on the URL.

If I can somehow find out what are the sub nodes available in that particular div through code, I might then can sort out something.

Do you want to recursively trace the nodes? (I can't tell if this works because I only speak English). You can add indentations and carriage returns to pretty it up.

private void button1_Click(object sender, EventArgs e)
{
    var webget = new HtmlWeb();
    var doc = webget.Load("http://www.dmp.gov.bd/application/index/pressdetails/press_159");

    HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='span8 inner_mess']");

    TraverseNodes(node.ChildNodes);
}

private void TraverseNodes(HtmlNodeCollection nodes)
{
    foreach (HtmlNode node in nodes)
    {
        textBox1.Text += node.InnerText;

        TraverseNodes(node.ChildNodes);
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM