簡體   English   中英

單擊所有“查看更多”按鈕並使用 selenium 和 beautifulsoup 從 LinkedIN 配置文件中抓取所有數據

[英]Click on all 'see more' buttons and scrape all data from a LinkedIN profile using selenium and beautifulsoup

我正在創建一個通用的 python 腳本來抓取任何 LinkedIN 配置文件。 在體驗部分,我們找到了很多次隱藏的體驗描述,可以通過單擊“查看更多”來顯示。 我想點擊所有“查看更多”並刪除整個描述。 我嘗試了以下代碼來打開配置文件,滾動頁面並單擊按鈕-:

from selenium import webdriver
from bs4 import BeautifulSoup
import time

# Creating an instance
driver = webdriver.Chrome("C:\WebDrivers\chromedriver.exe")

# Logging into LinkedIn
driver.get("https://linkedin.com/uas/login")
time.sleep(5)

username = driver.find_element_by_id("username")
username.send_keys("your login email") # Enter Your Email Address

pword = driver.find_element_by_id("password")
pword.send_keys("your password") # Enter Your Password

driver.find_element_by_xpath("//button[@type='submit']").click()

# Opening My Profile
# paste the URL of my profile here
profile_url = "https://www.linkedin.com/in/warikoo/"

driver.get(profile_url)  # this will open the link

start = time.time()

# will be used in the while loop
initialScroll = 0
finalScroll = 1000

while True:
   see_more = driver.find_elements_by_class('inline-show-more-text__button link') 
   
   driver.execute_script(f"window.scrollTo({initialScroll}, {finalScroll})")
   # this command scrolls the window starting from    
   # the pixel value stored in the initialScroll
   # variable to the pixel value stored at the
   # finalScroll variable
   initialScroll = finalScroll
   finalScroll += 1000
   
   if see_more:
       see_more.click()
   
   # we will stop the script for 3 seconds so that
   # the data can load
   time.sleep(3)

   end = time.time()
   
   # We will scroll for 20 seconds.
   if round(end - start) > 20:
        break

上面的代碼沒有點擊“查看更多”按鈕。 我還嘗試通過 Xpath 查找“查看更多”的元素,但它給出以下錯誤: NoSuchElementException:消息:沒有這樣的元素:無法找到元素:

為了抓取體驗部分,我編寫了以下代碼:

job_src = driver.page_source
soup = BeautifulSoup(job_src, 'lxml')

# Getting the HTML of the Experience section in the profile
experience = soup.find("section", {"id": "experience-section"}).find('ul')
jobs = experience.find_all('li', {'class': 'pv-entity__position-group-pager pv-profile-section__list-item ember-view'})

job_details={}
count = 0

#Looping through every experience to get its detail
for job in jobs :
    title = job.find('h3', {'class': 't-16 t-black t-bold'}).get_text()
    company = job.find('p', {'class': 'pv-entity__secondary-title t-14 t-black t-normal'}).get_text()
    time = job.find("h4", {'class':'pv-entity__date-range t-14 t-black--light t-normal'}).find('span', {'class': None}).get_text()
    duration = job.find("h4", {'class':'t-14 t-black--light t-normal'}).find('span', {'class': 'pv-entity__bullet-item-v2'}).get_text()
    location = job.find("h4", {'class':'pv-entity__location t-14 t-black--light t-normal block'}).find('span', {'class': None}).get_text()
    desc_temp = job.find("div", {'class':'inline-show-more-text inline-show-more-text--is-collapsed pv-entity__description t-14 t-black t-normal'})
    description = 'NA' if type(desc_temp) is None else desc_temp.get_text()

    #Storing all each experience detail in a dictionary
    job_dict = {
            'company': company,
            'title': title, 
            'time': time,
            'duration': duration,
            'location': location,
            'description': description
           }

    count = count + 1

    #Updating the dictionary with every loop iteration
    job_details.update({'job'+str(count): job_dict})

我不知道直接點擊查看更多不起作用的確切原因,但在這種情況下,以下操作會起作用。

button = driver.find_element_by_xpath("//*[@class='feed-shared-inline-show-more-text__see-more-less-toggle see-more t-14 t-black--light t-normal hoverable-link-text']")
driver.execute_script("arguments[0].click();", button)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM