简体   繁体   English

如何使用Selenium和Python从div中收集这些数据

[英]How can I collect this data from a div using Selenium and Python

I have been using Selenium and Python to scrape a webpage and I am having difficulty collecting data that I want out of a div that has the following structure: 我一直在使用Selenium和Python来抓取一个网页,我很难从具有以下结构的div中收集我想要的数据:

<div class="col span_6" style="margin-left: 12px;width: 47% !important;">
  <div class="MainGridRow">
    <span class="MainGridcolumn1">Heading1</span>
    <span class="MainGridcolumn2">Text that I want</span>
  </div>
  <div class="MainGridRow">
    <span class="MainGridcolumn1">Another heading</span>
    <span class="MainGridcolumn2">More text that I want</span>
  </div>
  <div class="MainGridRow">
    <span class="MainGridcolumn1">Next heading</span>
    <span class="MainGridcolumn2">Even more text</span>
  </div>
  <div class="MainGridRow">
    <span class="MainGridcolumn1">Yet another heading</span>
    <span class="MainGridcolumn2">Piece of text</span>
  </div>
</div>

The div has a number of rows, each with 2 columns containing the data/text inside of span tags. div有许多行,每行包含2列,包含span标记内的数据/文本。 There are no CSS ids. 没有CSS ID。

I'm only interested in collecting the text contained within the 'MainGridcolumn2' span classes. 我只对收集'MainGridcolumn2'span类中包含的文本感兴趣。

I've tried the below to navigate to the first heading, with the intention of then trying to use 'following_sibling' to move down to the next span tag containing the text, but I can't even get this to work as it isn't returning any text when I try to print it to the console: 我已经尝试过以下导航到第一个标题,然后尝试使用'following_sibling'向下移动到包含文本的下一个span标记,但我甚至无法使其工作,因为它不是'当我尝试将其打印到控制台时返回任何文本:

driver.find_element_by_xpath("//span['@class=MainGridcolumn1'][contains(text(), 'Heading1')]").text

and

driver.find_element_by_xpath("//span[contains(text(), 'Heading1')]").text

One way would be to get the the enclosing div ie the grandparent and pull the spans from that: 一种方法是获得封闭的div,即祖父母,并从中拉出跨度:

h = """<div class="col span_6" style="margin-left: 12px;width: 47% !important;">
  <div class="MainGridRow">
    <span class="MainGridcolumn1">Heading1</span>
    <span class="MainGridcolumn2">Text that I want</span>
  </div>
  <div class="MainGridRow">
    <span class="MainGridcolumn1">Another heading</span>
    <span class="MainGridcolumn2">More text that I want</span>
  </div>
  <div class="MainGridRow">
    <span class="MainGridcolumn1">Next heading</span>
    <span class="MainGridcolumn2">Even more text</span>
  </div>
  <div class="MainGridRow">
    <span class="MainGridcolumn1">Yet another heading</span>
    <span class="MainGridcolumn2">Piece of text</span>
  </div>
</div>

  <div class="MainGridRow">
    <span class="MainGridcolumn1">Yet another heading</span>
    <span class="MainGridcolumn2">Piece of text I don't want</span>
  </div>"""

from lxml import html

xm = html.fromstring(h)
div = xm.xpath("//span[@class='MainGridcolumn1'][contains(text(), 'Heading1')]/../..")[0]
print(div.xpath(".//span[@class='MainGridcolumn2']/text()"))

Which would give you: 哪个会给你:

['Text that I want', 'More text that I want', 'Even more text', 'Piece of text']

You could also just select the parent and get the parents siblings 您也可以选择父母并获得父母的兄弟姐妹

from lxml import html

xm = html.fromstring(h)
div = xm.xpath("//span[@class='MainGridcolumn1'][contains(text(), 'Heading1')]/..")[0]
print(div.xpath(".//span[@class='MainGridcolumn2']/text() | .//following-sibling::div/span[@class='MainGridcolumn2']/text()"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 Selenium Python 从 HTML 收集特定数据 - How to collect specific data from HTML using Selenium Python 如何使用Selenium Geckodriver收集数据python - How to collect data python using selenium geckodriver 如何使用 python + selenium 从 div 中提取内容 - How can I extract the content from a div using python + selenium 如何使用python从图像收集数据 - How to collect data from an image using python 如何使用 Selenium Python 从搜索结果中收集关键字 - How to collect the keywords from the search results using Selenium Python 如何从对象收集数据 - How can I collect data from objects 如何使用 Python 中的 Selenium 从在滚动上添加 div 的网页中抓取数据? - How do I scrape data using Selenium in Python from a webpage that adds div on scroll? 如何使用硒python动态单击按钮或div标签直到其从页面消失? - How can I click button or div tag dynamically until it disappear from page using selenium python? 如何通过在 selenium 中搜索找到 div 元素,然后使用 selenium 和 python 从该 div 复制属性? - how do i find a div element by searching in selenium, and then copying an attribute from that div using selenium and python? 我如何从图表硒Python中提取数据 - How can i extract data from a chart selenium python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM