簡體   English   中英

使用美麗的湯從確實刮取數據

[英]Using beautiful soup to scrape data from indeed

我正在嘗試使用 bs 來抓取簡歷,但我遇到了一些問題,這里是示例站點: https : //www.indeed.com/resumes?q=java& l =&cb=jt

這是我的代碼:

URL = "https://www.indeed.com/resumes?q=java&l=&cb=jt"
page = requests.get(URL)
soup = BeautifulSoup(page.text, 'html.parser')

def scrape_job_title(soup): 
    job = []
    for div in soup.find_all(name='li', attrs={'class':'sre'}):
        for a in div.find_all(name='a', attrs={'class':'app-link'}):
            job.append(a['title'])
        return(job)
scrape_job_title(soup)

它什么都不打印:[]

在此處輸入圖片說明

正如您在圖片中看到的,我想獲得職位“Java 開發人員”。

該類是app_link ,而不是app-link 此外, a['title']不會做你想要的。 使用a.contents[0]代替。

URL = "https://www.indeed.com/resumes?q=java&l=&cb=jt"
page = requests.get(URL)
soup = BeautifulSoup(page.text, 'html.parser')

def scrape_job_title(soup): 
    job = []
    for div in soup.find_all(name='li', attrs={'class':'sre'}):
        for a in div.find_all(name='a', attrs={'class':'app_link'}):
        job.append(a.contents[0])
    return(job)

scrape_job_title(soup)

試試這個來獲得所有的職位:

import requests
from bs4 import BeautifulSoup

URL = "https://www.indeed.com/resumes?q=java&l=&cb=jt"
page = requests.get(URL)
soup = BeautifulSoup(page.text, 'html5lib')

for items in soup.select('.sre'):
    data = [item.text for item in items.select('.app_link')]
    print(data)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM