What would be the correct syntax to this :
//footer//a | (//a[not(//footer)] and position() <=200)
Use only //footer if exists, if not, find all //a that are not in //footer and limit this to 200
You were really close. The OR operator already handles your case - if footer contains no <a>
nodes underneath it then second OR statement will be captured:
Using python
and parsel
(scrapy's html parser).
>>> foo = Selector("<footer><a>text</a></footer>")
>>> bar = Selector("<div><a>text</a><a>text2</a><a>text3</a><a>text4</a></div>")
>>> foo.xpath("//footer//a | //a[position()<=2]").get()
'<a>text</a>'
>>> bar.xpath("//footer//a | //a[position()<=2]").extract()
['<a>text</a>', '<a>text2</a>']
Note: I used 2
instead of 200
for brevity.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.