简体   繁体   中英

Linkedin scraper to extract skills

I'm trying to scrape people's public profiles to get most common skills for certain roles. I'm able to extract email, company, name, position etc. but I can't get the skills. I'm using Selector from parsel. I tried many approaches but clearly i'm targeting the wrong class and I should probably loop through skills. Here is my code so far:

def linkedin_scrape(linkedin_urls):

profiles = []

for url in linkedin_urls:

    _DRIVER_CHROME.get(url)
    sleep(5)

    selector = Selector(text=_DRIVER_CHROME.page_source)

    # Use xpath to extract the exact class containing the profile name
    name = selector.xpath('//*[starts-with(@class, "inline")]/text()').extract_first()
    if name:
        name = name.strip()

    # Use xpath to extract the exact class containing the profile position
    position = selector.xpath('//*[starts-with(@class, "mt1")]/text()').extract_first()

    if position:
        position = position.strip()
        position = position[0:position.find(' at ')]

    # Use xpath to extract the exact class containing the profile company
    company = selector.xpath('//*[starts-with(@class, "text-align-left")]/text()').extract_first()

    if company:
        company = company.strip()

    # Use xpath to extract skills

    skills = selector.xpath('//*[starts-with(@class, "pv-skill")]/text()').extract_first()

    if skills:
        skills = skills.strip()


    profiles.append([name, position, company, url])
    print(f'{len(profiles)}: {name}, {position}, {company}, {url}, {skills}')

return profiles

In order to capture all skills, you need first to expand the skills section so that it displays all skills and then target the class with the name that starts with 'pv-skill-category-entity__name-text'.

This works for me until today.

#locate link to expand skills
show_more_skills_button = driver.find_element_by_class_name("pv-skills-section__chevron-icon")
#expand
show_more_skills_button.click()

skills = driver.find_elements_by_xpath("//*[starts-with(@class,'pv-skill-category-entity__name-text')]")

#create skills set
skill_set = []
for skill in skills:
    skill_set.append(skill.text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM