[英]Selenium - How to to click all the More buttons of each individual items to scrape the data from the dropdown
[英]Click on all 'see more' buttons and scrape all data from a LinkedIN profile using selenium and beautifulsoup
我正在创建一个通用的 python 脚本来抓取任何 LinkedIN 配置文件。 在体验部分,我们找到了很多次隐藏的体验描述,可以通过单击“查看更多”来显示。 我想点击所有“查看更多”并删除整个描述。 我尝试了以下代码来打开配置文件,滚动页面并单击按钮-:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
# Creating an instance
driver = webdriver.Chrome("C:\WebDrivers\chromedriver.exe")
# Logging into LinkedIn
driver.get("https://linkedin.com/uas/login")
time.sleep(5)
username = driver.find_element_by_id("username")
username.send_keys("your login email") # Enter Your Email Address
pword = driver.find_element_by_id("password")
pword.send_keys("your password") # Enter Your Password
driver.find_element_by_xpath("//button[@type='submit']").click()
# Opening My Profile
# paste the URL of my profile here
profile_url = "https://www.linkedin.com/in/warikoo/"
driver.get(profile_url) # this will open the link
start = time.time()
# will be used in the while loop
initialScroll = 0
finalScroll = 1000
while True:
see_more = driver.find_elements_by_class('inline-show-more-text__button link')
driver.execute_script(f"window.scrollTo({initialScroll}, {finalScroll})")
# this command scrolls the window starting from
# the pixel value stored in the initialScroll
# variable to the pixel value stored at the
# finalScroll variable
initialScroll = finalScroll
finalScroll += 1000
if see_more:
see_more.click()
# we will stop the script for 3 seconds so that
# the data can load
time.sleep(3)
end = time.time()
# We will scroll for 20 seconds.
if round(end - start) > 20:
break
上面的代码没有点击“查看更多”按钮。 我还尝试通过 Xpath 查找“查看更多”的元素,但它给出以下错误: NoSuchElementException:消息:没有这样的元素:无法找到元素:
为了抓取体验部分,我编写了以下代码:
job_src = driver.page_source
soup = BeautifulSoup(job_src, 'lxml')
# Getting the HTML of the Experience section in the profile
experience = soup.find("section", {"id": "experience-section"}).find('ul')
jobs = experience.find_all('li', {'class': 'pv-entity__position-group-pager pv-profile-section__list-item ember-view'})
job_details={}
count = 0
#Looping through every experience to get its detail
for job in jobs :
title = job.find('h3', {'class': 't-16 t-black t-bold'}).get_text()
company = job.find('p', {'class': 'pv-entity__secondary-title t-14 t-black t-normal'}).get_text()
time = job.find("h4", {'class':'pv-entity__date-range t-14 t-black--light t-normal'}).find('span', {'class': None}).get_text()
duration = job.find("h4", {'class':'t-14 t-black--light t-normal'}).find('span', {'class': 'pv-entity__bullet-item-v2'}).get_text()
location = job.find("h4", {'class':'pv-entity__location t-14 t-black--light t-normal block'}).find('span', {'class': None}).get_text()
desc_temp = job.find("div", {'class':'inline-show-more-text inline-show-more-text--is-collapsed pv-entity__description t-14 t-black t-normal'})
description = 'NA' if type(desc_temp) is None else desc_temp.get_text()
#Storing all each experience detail in a dictionary
job_dict = {
'company': company,
'title': title,
'time': time,
'duration': duration,
'location': location,
'description': description
}
count = count + 1
#Updating the dictionary with every loop iteration
job_details.update({'job'+str(count): job_dict})
我不知道直接点击查看更多不起作用的确切原因,但在这种情况下,以下操作会起作用。
button = driver.find_element_by_xpath("//*[@class='feed-shared-inline-show-more-text__see-more-less-toggle see-more t-14 t-black--light t-normal hoverable-link-text']")
driver.execute_script("arguments[0].click();", button)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.