简体   繁体   中英

Using Xpath for screen scraping

Following is the HTML:

    <div class="CatContent">
<div class="LeftCon">
<span class="mv"></span>
<a href="http://movies.justdial.com/movies/Mumbai.html" target="_blank" onclick="_ct("psc_Movies","hmpg");">
<p>
</div>
<div class="RightCon">
</div>

I want to extract the text between the h1 tags ie Movies .

What should be the XPath for extracting the text between the h1 tags.??

This is what i am trying:

Dim webGet = New HtmlWeb()
        Dim document = webGet.Load("http://www.asadsdsad.com/")
        Dim nodes = document.DocumentNode.SelectNodes("//*[@class='LeftCon']/a[@target='_blank']/h1")

        Dim _table As New Data.DataTable

        _table.Columns.Add("BusinessPIN", GetType(String))
        For i = 0 To nodes.Count - 1
            Dim _newRow As Data.DataRow = _table.NewRow
            _table.Rows.Add(nodes(i).InnerText)
        Next
        GridView1.DataSource = _table
        GridView1.DataBind()
        MsgBox(GridView1.Rows.Count)

I have tried many variations but i always get "System.NullReferenceException: Object reference not set to an instance of an object."

What should be the XPath for extracting the text between the h1 tags.??

//h1 this will get you all the h1 elements

iterate the collection of h1 htmlelements and then to get text you use the InnerText property of the HtmlElement

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM