抓取Linkedin职位要求

Question

I am new to Python and I hope someone on here can help me.我是 Python 的新手，我希望这里有人可以帮助我。 I am building a program as part of my learning to scrape information from linkedin job adverts.我正在构建一个程序，作为我学习从 LinkedIn 招聘广告中抓取信息的一部分。 So far it has gone well however seem to have hit a brick wall with this particular issue.到目前为止，它进展顺利，但似乎在这个特殊问题上遇到了障碍。 I am attempting to scrape the full job description, including the qualifications .我正在尝试抓取完整的职位描述，包括资格。 I have identified the xpath for the description and am able to reference this via the following:我已经确定了 xpath 的描述，并且可以通过以下方式引用它：

desc_xpath = '/html/body/main/section/div[2]/section[2]/div'

This gives me nearly all of the job description information, however does not include the qualifications section of a linkedin job profile.这给了我几乎所有的职位描述信息，但不包括 LinkedIn 职位资料的资格部分。 I extract the high level, wordy element of each job profile, however the further drill downs such as responsibilities, qualifications, extra qualifications do not seem to get pulled by this reference.我提取了每个工作简介的高级、冗长的元素，但进一步的深入研究，如职责、资格、额外的资格似乎并没有被这个参考所吸引。

Is anybody able to help?有人能帮忙吗？

Kind regards亲切的问候

D D

Example Code示例代码

driver.get('https://www.linkedin.com/jobs/view/etl-developer-at-barclays-2376164866/?utm_campaign=google_jobs_apply&utm_source=google_jobs_apply&utm_medium=organic&originalSubdomain=uk')

time.sleep(3)

#job description
jobdesc_xpath = '/html/body/main/section[1]/section[3]/div/section/div'

job_descs = driver.find_element_by_xpath(jobdesc_xpath).text

print(job_descs) ```

Answer 1

Selenium struggles to get the text located in different sub-tags. Selenium 努力使文本位于不同的子标签中。 You could try to use an html parser, such as BeautifulSoup.您可以尝试使用 html 解析器，例如 BeautifulSoup。 Try this:尝试这个：

from bs4 import BeautifulSoup

url = 'https://www.linkedin.com/jobs/view/etl-developer-at-barclays-2376164866/?utm_campaign=google_jobs_apply&utm_source=google_jobs_apply&utm_medium=organic&originalSubdomain=uk'
driver.get(url)
#Find the job description
job_desc = driver.find_element_by_xpath('//div[@class="show-more-less-html__markup show-more-less-html__markup--clamp-after-5"]')
#Get the html of the element and pass into BeautifulSoup parser
soup = BeautifulSoup(job_desc.get_attribute('outerHTML'), 'html.parser')
#The parser will print each paragraph on the same line. Use 'separator = \n' to print each each paragraph on a new line and '\n\n' to print an empty line between paragraphs
soup.get_text(separator='\n\n')

抓取Linkedin职位要求

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-01-31 00:00:56

抓取Linkedin职位要求

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-01-31 00:00:56

解决方案1
1 已采纳 2021-01-31 00:00:56