简体   繁体   English

Windows Phone Web抓取

[英]Windows phone Web scraping

I'm trying to scrape data from a webpage. 我正在尝试从网页上抓取数据。 By using HtmlAgility pack I can load a particular div that I want to display. 通过使用HtmlAgility包,我可以加载要显示的特定div。 But inside this div node there are other sub/child node. 但是在这个div节点内还有其他子/子节点。 How can I extract the innerhtml of each subnode? 如何提取每个子节点的innerhtml? Here's what I've done: 这是我所做的:

var webget = new HtmlWeb();
var doc = webget.Load("http://www.dmp.gov.bd/application/index/pressdetails/press_159");

HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='span8 inner_mess']");

Here I'm pointing a specific webpage. 在这里,我指的是一个特定的网页。 It won't be the same all time, but it's confirm that the div is same and inside that div there will be different sub nodes depending on the URL. 它不会一直相同,但是可以确认div相同,并且在该div中,根据URL,将有不同的子节点。

If I can somehow find out what are the sub nodes available in that particular div through code, I might then can sort out something. 如果我能以某种方式找出通过代码在该特定div中可用的子节点,那么我可以进行一些整理。

Do you want to recursively trace the nodes? 您是否要递归跟踪节点? (I can't tell if this works because I only speak English). (我不能说这是否可行,因为我只会说英语)。 You can add indentations and carriage returns to pretty it up. 您可以添加缩进和回车符以使其漂亮。

private void button1_Click(object sender, EventArgs e)
{
    var webget = new HtmlWeb();
    var doc = webget.Load("http://www.dmp.gov.bd/application/index/pressdetails/press_159");

    HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='span8 inner_mess']");

    TraverseNodes(node.ChildNodes);
}

private void TraverseNodes(HtmlNodeCollection nodes)
{
    foreach (HtmlNode node in nodes)
    {
        textBox1.Text += node.InnerText;

        TraverseNodes(node.ChildNodes);
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM