I'm currently using the queries below to extract the text within the <h1>
<p>
and <h3>
tags.
$xpath->query("//div[contains(concat(' ', normalize-space(@class), ' '), ' grid_9 alpha omega newscontainer arena ')]/h1");
$xpath->query("//div[contains(concat(' ', normalize-space(@class), ' '), ' grid_9 alpha omega newscontainer arena ')]/p");
$xpath->query("//div[contains(concat(' ', normalize-space(@class), ' '), ' grid_9 alpha omega newscontainer arena ')]/h3");
They do sometimes come in different orders though, so i would like to catch them in order of appearance in the html. I did use
$xpath->query('//h1 | //p | //h3');
and that worked well to, but also caught some <p>
tags outside of the div class specified above. Using them in sequence didn't work at all. Is there a way to combine these queries into one?
Basically extracting all h1,p and h3 tags within a specific div class?
Why don't you try
$xpath->query("//div[contains(concat(' ', normalize-space(@class), ' '), ' grid_9 alpha omega newscontainer arena ')]/*[local-name()='h1' or local-name()='p' or local-name()='h3']");
This should give you the nodes in the order of their appearance restricted to children of the div
parent and also in XPath 1.0 which I assume is an unmentioned prerequisite.
When you use // will match any element with this tagname
You must be more specific and i suggest you
$xpath->query('//div/h1 | //div/p | //div/h3');
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.