简体   繁体   English

如何使用Selenium和Python抓取嵌套数据

[英]How do I scrape nested data using selenium and Python>

I basically want to scrape Feb 2016 - Present under <span class="visually-hidden"> , but I can't see to get to it. 我基本上想抓2016年2月-现在<span class="visually-hidden"> ,但是我看不到要这样做。 Here's the HTML at code: 这是HTML代码:

<div class="pv-entity__summary-info">

<h3 class="Sans-17px-black-85%-semibold">Litigation Paralegal</h3>

<h4>
  <span class="visually-hidden">Company Name</span>
  <span class="pv-entity__secondary-title Sans-15px-black-55%">Olswang</span>
</h4>


  <div class="pv-entity__position-info detail-facet m0"><h4 class="pv-entity__date-range Sans-15px-black-55%">
      <span class="visually-hidden">Dates Employed</span>
      <span>Feb 2016 – Present</span>
    </h4><h4 class="pv-entity__duration de Sans-15px-black-55% ml0">
        <span class="visually-hidden">Employment Duration</span>
        <span class="pv-entity__bullet-item">1 yr 2 mos</span>
      </h4><h4 class="pv-entity__location detail-facet Sans-15px-black-55% inline-block">
      <span class="visually-hidden">Location</span>
      <span class="pv-entity__bullet-item">London, United Kingdom</span>
    </h4></div>

</div>

And here is what I've been doing at the moment with selenium in my code: 这是我目前在代码中使用硒进行的操作:

        date= browser.find_element_by_xpath('.//div[@class = "pv-entity__duration de Sans-15px-black-55% ml0"]').text
        print date

But this gives no results. 但这没有结果。 How would I go about either pulling the date? 我将如何去约会?

There is no div with class="pv-entity__duration de Sans-15px-black-55% ml0" , but h4 . 没有带有class="pv-entity__duration de Sans-15px-black-55% ml0" div ,但是h4 If you want to get text of div , then try: 如果要获取div文本,请尝试:

date= browser.find_element_by_xpath('.//div[@class = "pv-entity__position-info detail-facet m0"]').text
print date

If you want to get "Feb 2016 - Present" , then try 如果您想获得"Feb 2016 - Present" ,请尝试

date= browser.find_element_by_xpath('//h4[@class="pv-entity__date-range Sans-15px-black-55%"]/span[2]').text
print date

You can rewrite your xpath code something like this : 您可以像这样重写xpath代码:

# -*- coding: utf-8 -*-
from lxml import html
import unicodedata


html_str = """
<div class="pv-entity__summary-info">

<h3 class="Sans-17px-black-85%-semibold">Litigation Paralegal</h3>

<h4>
  <span class="visually-hidden">Company Name</span>
  <span class="pv-entity__secondary-title Sans-15px-black-55%">Olswang</span>
</h4>


  <div class="pv-entity__position-info detail-facet m0"><h4 class="pv-entity__date-range Sans-15px-black-55%">
      <span class="visually-hidden">Dates Employed</span>
      <span>Feb 2016 – Present</span>
    </h4><h4 class="pv-entity__duration de Sans-15px-black-55% ml0">
        <span class="visually-hidden">Employment Duration</span>
        <span class="pv-entity__bullet-item">1 yr 2 mos</span>
      </h4><h4 class="pv-entity__location detail-facet Sans-15px-black-55% inline-block">
      <span class="visually-hidden">Location</span>
      <span class="pv-entity__bullet-item">London, United Kingdom</span>
    </h4></div>

</div>
"""

root = html.fromstring(html_str)
# For fetching Feb 2016 â Present :
txt = root.xpath('//h4[@class="pv-entity__date-range Sans-15px-black-55%"]/span/text()')[1]
# For fetching 1 yr 2 mos :
txt1 = root.xpath('//h4[@class="pv-entity__duration de Sans-15px-black-55% ml0"]/span/text()')[1]
print txt
print txt1

This will result in : 这将导致:

Feb 2016 â Present
1 yr 2 mos

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 Selenium 和 Python 从 Linkedin 页面抓取嵌套数据 - How to scrape the nested data from Linkedin page using Selenium and Python 如何在 Python 中使用 Selenium 抓取多个相互嵌套的元素? - How do I scrape multiple elements that are nested into each other using Selenium in Python? 如何使用selenium python在网站中删除::之前的元素 - How do I scrape ::before element in a website using selenium python 如何使用 Python 中的 Selenium 从在滚动上添加 div 的网页中抓取数据? - How do I scrape data using Selenium in Python from a webpage that adds div on scroll? 如何使用 Selenium 从 Trip Advisor 中抓取数据? - Python - How do I scrape data from Trip Advisor by using Selenium? - Python 如何使用 Selenium 抓取这些数据? - How could I scrape this data using Selenium? 如何从通过 selenium 和 python 提交数据后刷新的网页中抓取数据? - How do I scrape data from a web page that refreshes after submitting data via selenium and python? 如何使用 selenium 和 python 抓取数据,我正在尝试提取标题 div 标签中的所有数据 - How to scrape data using selenium and python, I am trying to extract all the data which is in title div tag 如何使用 Python、Selenium 和 BeautifulSoup 抓取 JSP? - How do I web-scrape a JSP with Python, Selenium and BeautifulSoup? 如何使用 Python Selenium 仅抓取一张特定图像? - How do I scrape just one specific image using Python Selenium?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM