I'm trying to scrape people's public profiles to get most common skills for certain roles. I'm able to extract email, company, name, position etc. but I can't get the skills. I'm using Selector from parsel. I tried many approaches but clearly i'm targeting the wrong class and I should probably loop through skills. Here is my code so far:
def linkedin_scrape(linkedin_urls):
profiles = []
for url in linkedin_urls:
_DRIVER_CHROME.get(url)
sleep(5)
selector = Selector(text=_DRIVER_CHROME.page_source)
# Use xpath to extract the exact class containing the profile name
name = selector.xpath('//*[starts-with(@class, "inline")]/text()').extract_first()
if name:
name = name.strip()
# Use xpath to extract the exact class containing the profile position
position = selector.xpath('//*[starts-with(@class, "mt1")]/text()').extract_first()
if position:
position = position.strip()
position = position[0:position.find(' at ')]
# Use xpath to extract the exact class containing the profile company
company = selector.xpath('//*[starts-with(@class, "text-align-left")]/text()').extract_first()
if company:
company = company.strip()
# Use xpath to extract skills
skills = selector.xpath('//*[starts-with(@class, "pv-skill")]/text()').extract_first()
if skills:
skills = skills.strip()
profiles.append([name, position, company, url])
print(f'{len(profiles)}: {name}, {position}, {company}, {url}, {skills}')
return profiles
In order to capture all skills, you need first to expand the skills section so that it displays all skills and then target the class with the name that starts with 'pv-skill-category-entity__name-text'.
This works for me until today.
#locate link to expand skills
show_more_skills_button = driver.find_element_by_class_name("pv-skills-section__chevron-icon")
#expand
show_more_skills_button.click()
skills = driver.find_elements_by_xpath("//*[starts-with(@class,'pv-skill-category-entity__name-text')]")
#create skills set
skill_set = []
for skill in skills:
skill_set.append(skill.text)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.