簡體   English   中英

使用 python 抓取多個網頁

[英]Web scraping of multiple web pages with python

我正在為“荷蘭”中的“初級用戶體驗設計師”進行網絡抓取。 該搜索詞的網站包含 6 個有空缺的網頁 - 也就是說,如果一個網頁包含 15 個空缺,我總共應該得到大約 90 個空缺。 但是,當我將它放入一個 json 文件時,我可以看到我收到了 90 行 - 但是,那里有多個重復項,並且文件中甚至沒有顯示許多職位空缺。

這是我正在使用的代碼:

import requests
from bs4 import BeautifulSoup
import json

jobs_NL = []
for i in range(1,7):
  url = "https://nl.indeed.com/vacatures?q=junior+ux+designer&l=Nederland&start="+str(i)
  
  print("Getting page",i)
  
  page = requests.get(url)

  html = BeautifulSoup(page.content, "html.parser")

  job_title = html.find_all("table", class_="jobCard_mainContent")

  for item in job_title:
      title = item.find("h2").get_text() 
      company = item.find("span", class_="companyName").get_text()
      location = item.find("div", class_="companyLocation").get_text()

      if item.find("div", class_="salary-snippet") != None:
        salary = item.find("div", class_="heading6 tapItem-gutter metadataContainer").get_text()
      else:
        salary = "No salary found"

      vacancy = {
          "title": title,
          "company": company,
          "location": location,
          "salary": salary
          }
      jobs_NL.append(vacancy)

您需要將start變量乘以10以獲得正確的頁面:

import requests
import pandas as pd
from bs4 import BeautifulSoup

jobs_NL = []
for i in range(7):
    url = "https://nl.indeed.com/vacatures?q=junior+ux+designer&l=Nederland&start={}".format(
        10 * i
    )

    print("Getting page", i)

    page = requests.get(url)
    html = BeautifulSoup(page.content, "html.parser")
    job_title = html.find_all("table", class_="jobCard_mainContent")

    for item in job_title:
        title = item.find("h2").get_text()
        company = item.find("span", class_="companyName").get_text()
        location = item.find("div", class_="companyLocation").get_text()

        if item.find("div", class_="salary-snippet") != None:
            salary = item.find(
                "div", class_="heading6 tapItem-gutter metadataContainer"
            ).get_text()
        else:
            salary = "No salary found"

        vacancy = {
            "title": title,
            "company": company,
            "location": location,
            "salary": salary,
        }
        jobs_NL.append(vacancy)

df = pd.DataFrame(jobs_NL)
print(df)

印刷:

...
90                                           UX Designer | SaaS Platform                                       StarApple                                     Amersfoort  €3.000 - €4.500 per maand
91                                                    Frontend Developer                                      JustBetter                                        Alkmaar            No salary found
92                                                     Software Engineer                              Infinitas Learning                                    Thuiswerken            No salary found
93                                                         UX Researcher                  Cognizant Technology Solutions                                      Amsterdam            No salary found
94                                            Junior Front End developer                                       StarApple                                 Zeist+1 plaats  €2.500 - €3.000 per maand
95                                  nieuwSenior User Experience Designer                                         Trimble                                     Bodegraven            No salary found
96                                  Senior UX Designer - Research Agency                        Found Professionals B.V.                             Amsterdam+1 plaats            No salary found
97                                                HubSpot marketing lead                                          Comaxx                                         Waalre            No salary found
98                                  nieuwJunior Technisch CRO Specialist                                   Finest People                                 Amsterdam West           €50.000 per jaar
99                                                         iOS developer                                       Infoplaza                                         Houten            No salary found

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM