单击所有“查看更多”按钮并使用 selenium 和 beautifulsoup 从 LinkedIN 配置文件中抓取所有数据

Question

我正在创建一个通用的 python 脚本来抓取任何 LinkedIN 配置文件。 在体验部分，我们找到了很多次隐藏的体验描述，可以通过单击“查看更多”来显示。 我想点击所有“查看更多”并删除整个描述。 我尝试了以下代码来打开配置文件，滚动页面并单击按钮-：

from selenium import webdriver
from bs4 import BeautifulSoup
import time

# Creating an instance
driver = webdriver.Chrome("C:\WebDrivers\chromedriver.exe")

# Logging into LinkedIn
driver.get("https://linkedin.com/uas/login")
time.sleep(5)

username = driver.find_element_by_id("username")
username.send_keys("your login email") # Enter Your Email Address

pword = driver.find_element_by_id("password")
pword.send_keys("your password") # Enter Your Password

driver.find_element_by_xpath("//button[@type='submit']").click()

# Opening My Profile
# paste the URL of my profile here
profile_url = "https://www.linkedin.com/in/warikoo/"

driver.get(profile_url)  # this will open the link

start = time.time()

# will be used in the while loop
initialScroll = 0
finalScroll = 1000

while True:
   see_more = driver.find_elements_by_class('inline-show-more-text__button link') 
   
   driver.execute_script(f"window.scrollTo({initialScroll}, {finalScroll})")
   # this command scrolls the window starting from    
   # the pixel value stored in the initialScroll
   # variable to the pixel value stored at the
   # finalScroll variable
   initialScroll = finalScroll
   finalScroll += 1000
   
   if see_more:
       see_more.click()
   
   # we will stop the script for 3 seconds so that
   # the data can load
   time.sleep(3)

   end = time.time()
   
   # We will scroll for 20 seconds.
   if round(end - start) > 20:
        break

上面的代码没有点击“查看更多”按钮。 我还尝试通过 Xpath 查找“查看更多”的元素，但它给出以下错误： NoSuchElementException：消息：没有这样的元素：无法找到元素：

为了抓取体验部分，我编写了以下代码：

job_src = driver.page_source
soup = BeautifulSoup(job_src, 'lxml')

# Getting the HTML of the Experience section in the profile
experience = soup.find("section", {"id": "experience-section"}).find('ul')
jobs = experience.find_all('li', {'class': 'pv-entity__position-group-pager pv-profile-section__list-item ember-view'})

job_details={}
count = 0

#Looping through every experience to get its detail
for job in jobs :
    title = job.find('h3', {'class': 't-16 t-black t-bold'}).get_text()
    company = job.find('p', {'class': 'pv-entity__secondary-title t-14 t-black t-normal'}).get_text()
    time = job.find("h4", {'class':'pv-entity__date-range t-14 t-black--light t-normal'}).find('span', {'class': None}).get_text()
    duration = job.find("h4", {'class':'t-14 t-black--light t-normal'}).find('span', {'class': 'pv-entity__bullet-item-v2'}).get_text()
    location = job.find("h4", {'class':'pv-entity__location t-14 t-black--light t-normal block'}).find('span', {'class': None}).get_text()
    desc_temp = job.find("div", {'class':'inline-show-more-text inline-show-more-text--is-collapsed pv-entity__description t-14 t-black t-normal'})
    description = 'NA' if type(desc_temp) is None else desc_temp.get_text()

    #Storing all each experience detail in a dictionary
    job_dict = {
            'company': company,
            'title': title, 
            'time': time,
            'duration': duration,
            'location': location,
            'description': description
           }

    count = count + 1

    #Updating the dictionary with every loop iteration
    job_details.update({'job'+str(count): job_dict})

Answer 1

我不知道直接点击查看更多不起作用的确切原因，但在这种情况下，以下操作会起作用。

button = driver.find_element_by_xpath("//*[@class='feed-shared-inline-show-more-text__see-more-less-toggle see-more t-14 t-black--light t-normal hoverable-link-text']")
driver.execute_script("arguments[0].click();", button)

单击所有“查看更多”按钮并使用 selenium 和 beautifulsoup 从 LinkedIN 配置文件中抓取所有数据

问题描述

1 个解决方案

解决方案1
0 2022-09-02 15:59:16

单击所有“查看更多”按钮并使用 selenium 和 beautifulsoup 从 LinkedIN 配置文件中抓取所有数据

问题描述

1 个解决方案

解决方案1 0 2022-09-02 15:59:16

解决方案1
0 2022-09-02 15:59:16