繁体   English   中英

如何从这个元素中抓取所有数据? 节点/木偶师

[英]How to scrape all data from this element? Nodejs/Puppeteer

我想从这些元素中收集姓名、职位和类型(在线/亲自)

<div class="cse-userslist-user" data-user="178">
  <div class="cse-ul--img">
    <div class="cse-ul--img-child">
      <img src="https://secure.gravatar.com/avatar/99574b52aaa5ecb0bea650602fecfbd7?s=100&amp;d=mm&amp;r=g" alt="Dina Abdelma">
    </div>
  </div>
  <div class="cse-ul--content">
    <div class="cse-ul--name">Dina Abdelma</div>
    <div class="cse-ul--position">Head of SMEs, MDI</div>
    <div class="cse-ul--role">Online</div>
  </div>
  <div class="cse-ul-overlay">
    <div class="cse-ul-overlay-bg"></div>
    <a class="cse-open-popform cse-btn cse-btn--primary">
                                        Message                                    </a>
    <a href="#" class="cse-btn cse-btn--primary disabled">Schedule Meeting</a> </div>
</div>

我进入了登录页面,但我无法抓取所有数据,我只抓取了一个名字的第一个字母。

此外,数据用户中的数字始终是随机的,没有其他变化我想从这三个元素中抓取数据并将它们放入数组/excel 中。

 <div class="cse-ul--name">Dina Abdelma</div>
 <div class="cse-ul--position">Head of SMEs, MDI</div>
 <div class="cse-ul--role">Online</div>

这是我当前登录网页的代码(无关,有效)

  await page.waitForSelector('#username')
  await page.type('#username', login)
  await page.type('#password', password)
  await page.click('#ur-frontend-form > form > div > div > div > input')
  await page.waitForSelector('#cse-main > div > div > section.cse-section.cse-section--links > div > a:nth-child(2)')
  await page.click('#cse-main > div > div > section.cse-section.cse-section--links > div > a:nth-child(2)')
  await page.waitForSelector('#cse-main > div.cse-page.cse-page--networking.cse-global-bg > section.cse-section.cse-section--userslist > div > div.cse-userslist-button > a')
  await page.click('#cse-main > div.cse-page.cse-page--networking.cse-global-bg > section.cse-section.cse-section--userslist > div > div.cse-userslist-button > a')

编辑

  var names = await page.$$eval('.cse-ul--name',
  elements=> elements.map(item=>item.textContent))

有效但不会抓取所有数据,只是抓取可见的数据。

你可以使用美丽的汤:

from bs4 import BeautifulSoup 

soup = BeautifulSoup(html, 'html.parser') # html = the given html page from your question

# looks for a div, class='cse-ul--name' and decodes the contents of it
print(soup.find('div', 'cse-ul--name').decode_contents())

# looks for a div, class='cse-ul--position' and decodes the contents of it
print(soup.find('div', 'cse-ul--position').decode_contents())

# looks for a div, class='cse-ul--role' and decodes the contents of it
print(soup.find('div', 'cse-ul--role').decode_contents())

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM