繁体   English   中英

单击所有“查看更多”按钮并使用 selenium 和 beautifulsoup 从 LinkedIN 配置文件中抓取所有数据

[英]Click on all 'see more' buttons and scrape all data from a LinkedIN profile using selenium and beautifulsoup

我正在创建一个通用的 python 脚本来抓取任何 LinkedIN 配置文件。 在体验部分,我们找到了很多次隐藏的体验描述,可以通过单击“查看更多”来显示。 我想点击所有“查看更多”并删除整个描述。 我尝试了以下代码来打开配置文件,滚动页面并单击按钮-:

from selenium import webdriver
from bs4 import BeautifulSoup
import time

# Creating an instance
driver = webdriver.Chrome("C:\WebDrivers\chromedriver.exe")

# Logging into LinkedIn
driver.get("https://linkedin.com/uas/login")
time.sleep(5)

username = driver.find_element_by_id("username")
username.send_keys("your login email") # Enter Your Email Address

pword = driver.find_element_by_id("password")
pword.send_keys("your password") # Enter Your Password

driver.find_element_by_xpath("//button[@type='submit']").click()

# Opening My Profile
# paste the URL of my profile here
profile_url = "https://www.linkedin.com/in/warikoo/"

driver.get(profile_url)  # this will open the link

start = time.time()

# will be used in the while loop
initialScroll = 0
finalScroll = 1000

while True:
   see_more = driver.find_elements_by_class('inline-show-more-text__button link') 
   
   driver.execute_script(f"window.scrollTo({initialScroll}, {finalScroll})")
   # this command scrolls the window starting from    
   # the pixel value stored in the initialScroll
   # variable to the pixel value stored at the
   # finalScroll variable
   initialScroll = finalScroll
   finalScroll += 1000
   
   if see_more:
       see_more.click()
   
   # we will stop the script for 3 seconds so that
   # the data can load
   time.sleep(3)

   end = time.time()
   
   # We will scroll for 20 seconds.
   if round(end - start) > 20:
        break

上面的代码没有点击“查看更多”按钮。 我还尝试通过 Xpath 查找“查看更多”的元素,但它给出以下错误: NoSuchElementException:消息:没有这样的元素:无法找到元素:

为了抓取体验部分,我编写了以下代码:

job_src = driver.page_source
soup = BeautifulSoup(job_src, 'lxml')

# Getting the HTML of the Experience section in the profile
experience = soup.find("section", {"id": "experience-section"}).find('ul')
jobs = experience.find_all('li', {'class': 'pv-entity__position-group-pager pv-profile-section__list-item ember-view'})

job_details={}
count = 0

#Looping through every experience to get its detail
for job in jobs :
    title = job.find('h3', {'class': 't-16 t-black t-bold'}).get_text()
    company = job.find('p', {'class': 'pv-entity__secondary-title t-14 t-black t-normal'}).get_text()
    time = job.find("h4", {'class':'pv-entity__date-range t-14 t-black--light t-normal'}).find('span', {'class': None}).get_text()
    duration = job.find("h4", {'class':'t-14 t-black--light t-normal'}).find('span', {'class': 'pv-entity__bullet-item-v2'}).get_text()
    location = job.find("h4", {'class':'pv-entity__location t-14 t-black--light t-normal block'}).find('span', {'class': None}).get_text()
    desc_temp = job.find("div", {'class':'inline-show-more-text inline-show-more-text--is-collapsed pv-entity__description t-14 t-black t-normal'})
    description = 'NA' if type(desc_temp) is None else desc_temp.get_text()

    #Storing all each experience detail in a dictionary
    job_dict = {
            'company': company,
            'title': title, 
            'time': time,
            'duration': duration,
            'location': location,
            'description': description
           }

    count = count + 1

    #Updating the dictionary with every loop iteration
    job_details.update({'job'+str(count): job_dict})

我不知道直接点击查看更多不起作用的确切原因,但在这种情况下,以下操作会起作用。

button = driver.find_element_by_xpath("//*[@class='feed-shared-inline-show-more-text__see-more-less-toggle see-more t-14 t-black--light t-normal hoverable-link-text']")
driver.execute_script("arguments[0].click();", button)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM