简体   繁体   中英

xpath doesn't work in this website

I am scraping individual listing pages from justproperty.com (individual listing from the original question no longer active).

I want to get the value of the Ref

this is my xpath:

>>> sel.xpath('normalize-space(.//div[@class="info_div"]/table/tbody/tr/td[norma
lize-space(text())="Ref:"]/following-sibling::td[1]/text())').extract()[0]

This has no results in scrapy, despite working in my browser.

The following works perfectly in lxml.html (with modern Scrapy uses):

sel.xpath('.//div[@class="info_div"]//td[text()="Ref:"]/following-sibling::td[1]/text()')

Note that I'm using // to get between the div and the td , not laying out the explicit path. I'd have to take a closer look at the document to grok why, but the path given in that area was incorrect.

Don't create XPath expression by looking at Firebug or Chrome Dev Tools, they're changing the markup. Remove the /tbody axis step and you'll receive exactly what you're look for.

normalize-space(.//div[@class="info_div"]/table/tr/td[
  normalize-space(text())="Ref:"
]/following-sibling::td[1]/text())

Read Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing? for more details.

Another XPath that gets the same thing: (.//td[@class='titles']/../td[2])[1]

I tried your XPath using XPath Checker and it works fine.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM