简体   繁体   中英

Xpath - get text from all h1, h3 p tags within a div

I'm currently using the queries below to extract the text within the <h1> <p> and <h3> tags.

$xpath->query("//div[contains(concat(' ', normalize-space(@class), ' '), ' grid_9 alpha omega newscontainer arena ')]/h1");
$xpath->query("//div[contains(concat(' ', normalize-space(@class), ' '), ' grid_9 alpha omega newscontainer arena ')]/p");
$xpath->query("//div[contains(concat(' ', normalize-space(@class), ' '), ' grid_9 alpha omega newscontainer arena ')]/h3");

They do sometimes come in different orders though, so i would like to catch them in order of appearance in the html. I did use

$xpath->query('//h1 | //p | //h3');

and that worked well to, but also caught some <p> tags outside of the div class specified above. Using them in sequence didn't work at all. Is there a way to combine these queries into one?

Basically extracting all h1,p and h3 tags within a specific div class?

Why don't you try

$xpath->query("//div[contains(concat(' ', normalize-space(@class), ' '), ' grid_9 alpha omega newscontainer arena ')]/*[local-name()='h1' or local-name()='p' or local-name()='h3']");

This should give you the nodes in the order of their appearance restricted to children of the div parent and also in XPath 1.0 which I assume is an unmentioned prerequisite.

When you use // will match any element with this tagname

You must be more specific and i suggest you

$xpath->query('//div/h1 | //div/p | //div/h3');

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM