抓取Linkedin職位要求

Question

我是 Python 的新手，我希望這里有人可以幫助我。 我正在構建一個程序，作為我學習從 LinkedIn 招聘廣告中抓取信息的一部分。 到目前為止，它進展順利，但似乎在這個特殊問題上遇到了障礙。 我正在嘗試抓取完整的職位描述，包括資格。 我已經確定了 xpath 的描述，並且可以通過以下方式引用它：

desc_xpath = '/html/body/main/section/div[2]/section[2]/div'

這給了我幾乎所有的職位描述信息，但不包括 LinkedIn 職位資料的資格部分。 我提取了每個工作簡介的高級、冗長的元素，但進一步的深入研究，如職責、資格、額外的資格似乎並沒有被這個參考所吸引。

有人能幫忙嗎？

親切的問候

D

示例代碼

driver.get('https://www.linkedin.com/jobs/view/etl-developer-at-barclays-2376164866/?utm_campaign=google_jobs_apply&utm_source=google_jobs_apply&utm_medium=organic&originalSubdomain=uk')

time.sleep(3)

#job description
jobdesc_xpath = '/html/body/main/section[1]/section[3]/div/section/div'

job_descs = driver.find_element_by_xpath(jobdesc_xpath).text

print(job_descs) ```

Answer 1

Selenium 努力使文本位於不同的子標簽中。 您可以嘗試使用 html 解析器，例如 BeautifulSoup。 嘗試這個：

from bs4 import BeautifulSoup

url = 'https://www.linkedin.com/jobs/view/etl-developer-at-barclays-2376164866/?utm_campaign=google_jobs_apply&utm_source=google_jobs_apply&utm_medium=organic&originalSubdomain=uk'
driver.get(url)
#Find the job description
job_desc = driver.find_element_by_xpath('//div[@class="show-more-less-html__markup show-more-less-html__markup--clamp-after-5"]')
#Get the html of the element and pass into BeautifulSoup parser
soup = BeautifulSoup(job_desc.get_attribute('outerHTML'), 'html.parser')
#The parser will print each paragraph on the same line. Use 'separator = \n' to print each each paragraph on a new line and '\n\n' to print an empty line between paragraphs
soup.get_text(separator='\n\n')

抓取Linkedin職位要求

問題描述

1 個解決方案

解決方案1
1 已采納 2021-01-31 00:00:56

抓取Linkedin職位要求

問題描述

1 個解決方案

解決方案1 1 已采納 2021-01-31 00:00:56

解決方案1
1 已采納 2021-01-31 00:00:56