简体   繁体   English

Xpath-从div中的所有h1,h3 p标签获取文本

[英]Xpath - get text from all h1, h3 p tags within a div

I'm currently using the queries below to extract the text within the <h1> <p> and <h3> tags. 我目前正在使用以下查询来提取<h1> <p><h3>标记内的文本。

$xpath->query("//div[contains(concat(' ', normalize-space(@class), ' '), ' grid_9 alpha omega newscontainer arena ')]/h1");
$xpath->query("//div[contains(concat(' ', normalize-space(@class), ' '), ' grid_9 alpha omega newscontainer arena ')]/p");
$xpath->query("//div[contains(concat(' ', normalize-space(@class), ' '), ' grid_9 alpha omega newscontainer arena ')]/h3");

They do sometimes come in different orders though, so i would like to catch them in order of appearance in the html. 它们有时确实会以不同的顺序出现,所以我想按它们在html中出现的顺序来捕捉它们。 I did use 我用过

$xpath->query('//h1 | //p | //h3');

and that worked well to, but also caught some <p> tags outside of the div class specified above. 效果很好,但在上面指定的div类之外也捕获了一些<p>标记。 Using them in sequence didn't work at all. 依次使用它们根本不起作用。 Is there a way to combine these queries into one? 有没有办法将这些查询组合成一个查询?

Basically extracting all h1,p and h3 tags within a specific div class? 基本上提取特定div类中的所有h1,p和h3标签?

Why don't you try 你为什么不尝试

$xpath->query("//div[contains(concat(' ', normalize-space(@class), ' '), ' grid_9 alpha omega newscontainer arena ')]/*[local-name()='h1' or local-name()='p' or local-name()='h3']");

This should give you the nodes in the order of their appearance restricted to children of the div parent and also in XPath 1.0 which I assume is an unmentioned prerequisite. 这应该按照节点的出现顺序(仅限于div父级的子级)以及XPath 1.0中的顺序为您提供节点,我认为这是一个未提及的先决条件。

When you use // will match any element with this tagname 当您使用//时,将匹配具有此标记名的任何元素

You must be more specific and i suggest you 您必须更具体,我建议您

$xpath->query('//div/h1 | //div/p | //div/h3');

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM