Trouble with scraping text from site using lxml / xpath()

Question

quick one. I'm new to using lxml and have spent quite a while trying to scrape text data from a particular site. The element structure is as shown below:

http://tinypic.com/r/2iw7zaa/8

What i want to do is extract the 100,100 that is shown within the highlighted area. The statements i've tried include (I saved the source of the site into a text file to test, test.txt - tried also with html extension):

from lxml import html
tree = html.parse(test.txt)
#value = tree.xpath('//*[@id="content"]/table[4]/tbody/tr[1]/td[2]')
#value = tree.xpath('//*[@id="content"]/table[4]/tbody/tr[1]/td[2]/text()')

All i seem to get as a result is an empty list [] ,any help would be greatly appreciated.

ps i commented out the two value statements as I'm showing what i tried. I tried a bunch of other xpath statements similiar to the ones above but they were lost as the python shell crashed on me.

pps. apologies for the link to the pic - due to rep I can't post the pic directly.

Answer 1

Try removing '/tbody' from the xpath.

The browser might be adding the `/tbody' tag whereas it might not appear in the raw HTML.

Read more here and here .

Trouble with scraping text from site using lxml / xpath()

Question

1 answers

solution1
1 ACCPTED 2014-09-29 15:40:50

Trouble with scraping text from site using lxml / xpath()

Question

1 answers

solution1 1 ACCPTED 2014-09-29 15:40:50

solution1
1 ACCPTED 2014-09-29 15:40:50