简体   繁体   English

在每个div中仅查找类标记的首次出现的XPath是什么?

[英]What is the XPath to find only the first occurence of a class tag in each div?

I'm trying to scrape some text of a website that has a list of products. 我正在尝试抓取具有产品列表的网站的某些文本。 What is the XPath to get the text of only the first occurrence of a class tag in each div? 什么是XPath来获取每个div中仅第一次出现的class标签的文本? In the code below, I need the first occurence of the text of span "bar" for each div "foo". 在下面的代码中,我需要为每个div“ foo”首次出现跨度“ bar”的文本。

So I need the XPath that gives me only "Year A", "Year C", etc. 因此,我需要仅给我“ A年”,“ C年”等的XPath。

I'm new with this and have no clue to do this. 我对此并不陌生,不知道这样做。 Many thanks for any help offered! 非常感谢您提供的任何帮助!

<div class="foo">                       
    <span class="bar">year A</span>
    <span class="qux">some text</span>
    <span class="bar">year B</span>
</div>

<div class="foo">                       
    <span class="bar">year C</span>
    <span class="qux">some text</span>
    <span class="bar">year D</span>
</div>

Etc.

With something like //span[@class='bar'][1]/text() one would only get "Year A". 使用// span [@ class ='bar'] [1] / text()之类的东西,只会得到“ Y年A”。

With something like //*[contains(@class, 'bar')]/text() one would get "Year A", "Year B", "Year C" and "Year D". 使用// * [contains(@class,'bar')] / text()之类的东西,将得到“ A年”,“ B年”,“ C年”和“ D年”。

I'm scraping multiple pages and the number of items on each page is different. 我正在抓取多个页面,并且每个页面上的项目数都不同。 The class name "bar" is only used for the elements I need, so the problem described here: What is the XPath expression to find only the first occurrence? 类名“ bar”仅用于我需要的元素,因此这里描述的问题是: 仅查找首次出现的XPath表达式是什么? does not apply. 不适用。

This one worked fine in XPath tester : 这在XPath测试器中工作正常:

//div[@class='foo']/span[@class='bar'][1]/text()

or without text() if you don't really need it : 或没有text()如果您真的不需要它:

//div[@class='foo']/span[@class='bar'][1]

With //div[@class = 'foo']/span[@class = 'bar'][1] you would select each first child span with attribute class being bar . 使用//div[@class = 'foo']/span[@class = 'bar'][1]您将选择每个第一个子span ,属性classbar If the class or name of the parent does not matter then use //*/span[@class = 'bar'][1] . 如果父级的类或名称无关紧要,请使用//*/span[@class = 'bar'][1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM