简体   繁体   中英

Trouble with scraping text from site using lxml / xpath()

quick one. I'm new to using lxml and have spent quite a while trying to scrape text data from a particular site. The element structure is as shown below:

http://tinypic.com/r/2iw7zaa/8

What i want to do is extract the 100,100 that is shown within the highlighted area. The statements i've tried include (I saved the source of the site into a text file to test, test.txt - tried also with html extension):

from lxml import html
tree = html.parse(test.txt)
#value = tree.xpath('//*[@id="content"]/table[4]/tbody/tr[1]/td[2]')
#value = tree.xpath('//*[@id="content"]/table[4]/tbody/tr[1]/td[2]/text()')

All i seem to get as a result is an empty list [] ,any help would be greatly appreciated.

ps i commented out the two value statements as I'm showing what i tried. I tried a bunch of other xpath statements similiar to the ones above but they were lost as the python shell crashed on me.

pps. apologies for the link to the pic - due to rep I can't post the pic directly.

Try removing '/tbody' from the xpath.

The browser might be adding the `/tbody' tag whereas it might not appear in the raw HTML.

Read more here and here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM