[英]How can I collect this data from a div using Selenium and Python
I have been using Selenium and Python to scrape a webpage and I am having difficulty collecting data that I want out of a div that has the following structure: 我一直在使用Selenium和Python来抓取一个网页,我很难从具有以下结构的div中收集我想要的数据:
<div class="col span_6" style="margin-left: 12px;width: 47% !important;">
<div class="MainGridRow">
<span class="MainGridcolumn1">Heading1</span>
<span class="MainGridcolumn2">Text that I want</span>
</div>
<div class="MainGridRow">
<span class="MainGridcolumn1">Another heading</span>
<span class="MainGridcolumn2">More text that I want</span>
</div>
<div class="MainGridRow">
<span class="MainGridcolumn1">Next heading</span>
<span class="MainGridcolumn2">Even more text</span>
</div>
<div class="MainGridRow">
<span class="MainGridcolumn1">Yet another heading</span>
<span class="MainGridcolumn2">Piece of text</span>
</div>
</div>
The div has a number of rows, each with 2 columns containing the data/text inside of span tags. div有许多行,每行包含2列,包含span标记内的数据/文本。 There are no CSS ids.
没有CSS ID。
I'm only interested in collecting the text contained within the 'MainGridcolumn2' span classes. 我只对收集'MainGridcolumn2'span类中包含的文本感兴趣。
I've tried the below to navigate to the first heading, with the intention of then trying to use 'following_sibling' to move down to the next span tag containing the text, but I can't even get this to work as it isn't returning any text when I try to print it to the console: 我已经尝试过以下导航到第一个标题,然后尝试使用'following_sibling'向下移动到包含文本的下一个span标记,但我甚至无法使其工作,因为它不是'当我尝试将其打印到控制台时返回任何文本:
driver.find_element_by_xpath("//span['@class=MainGridcolumn1'][contains(text(), 'Heading1')]").text
and 和
driver.find_element_by_xpath("//span[contains(text(), 'Heading1')]").text
One way would be to get the the enclosing div ie the grandparent and pull the spans from that: 一种方法是获得封闭的div,即祖父母,并从中拉出跨度:
h = """<div class="col span_6" style="margin-left: 12px;width: 47% !important;">
<div class="MainGridRow">
<span class="MainGridcolumn1">Heading1</span>
<span class="MainGridcolumn2">Text that I want</span>
</div>
<div class="MainGridRow">
<span class="MainGridcolumn1">Another heading</span>
<span class="MainGridcolumn2">More text that I want</span>
</div>
<div class="MainGridRow">
<span class="MainGridcolumn1">Next heading</span>
<span class="MainGridcolumn2">Even more text</span>
</div>
<div class="MainGridRow">
<span class="MainGridcolumn1">Yet another heading</span>
<span class="MainGridcolumn2">Piece of text</span>
</div>
</div>
<div class="MainGridRow">
<span class="MainGridcolumn1">Yet another heading</span>
<span class="MainGridcolumn2">Piece of text I don't want</span>
</div>"""
from lxml import html
xm = html.fromstring(h)
div = xm.xpath("//span[@class='MainGridcolumn1'][contains(text(), 'Heading1')]/../..")[0]
print(div.xpath(".//span[@class='MainGridcolumn2']/text()"))
Which would give you: 哪个会给你:
['Text that I want', 'More text that I want', 'Even more text', 'Piece of text']
You could also just select the parent and get the parents siblings 您也可以选择父母并获得父母的兄弟姐妹
from lxml import html
xm = html.fromstring(h)
div = xm.xpath("//span[@class='MainGridcolumn1'][contains(text(), 'Heading1')]/..")[0]
print(div.xpath(".//span[@class='MainGridcolumn2']/text() | .//following-sibling::div/span[@class='MainGridcolumn2']/text()"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.