[英]Web scraping of multiple web pages with python
我正在為“荷蘭”中的“初級用戶體驗設計師”進行網絡抓取。 該搜索詞的網站包含 6 個有空缺的網頁 - 也就是說,如果一個網頁包含 15 個空缺,我總共應該得到大約 90 個空缺。 但是,當我將它放入一個 json 文件時,我可以看到我收到了 90 行 - 但是,那里有多個重復項,並且文件中甚至沒有顯示許多職位空缺。
這是我正在使用的代碼:
import requests
from bs4 import BeautifulSoup
import json
jobs_NL = []
for i in range(1,7):
url = "https://nl.indeed.com/vacatures?q=junior+ux+designer&l=Nederland&start="+str(i)
print("Getting page",i)
page = requests.get(url)
html = BeautifulSoup(page.content, "html.parser")
job_title = html.find_all("table", class_="jobCard_mainContent")
for item in job_title:
title = item.find("h2").get_text()
company = item.find("span", class_="companyName").get_text()
location = item.find("div", class_="companyLocation").get_text()
if item.find("div", class_="salary-snippet") != None:
salary = item.find("div", class_="heading6 tapItem-gutter metadataContainer").get_text()
else:
salary = "No salary found"
vacancy = {
"title": title,
"company": company,
"location": location,
"salary": salary
}
jobs_NL.append(vacancy)
您需要將start
變量乘以10
以獲得正確的頁面:
import requests
import pandas as pd
from bs4 import BeautifulSoup
jobs_NL = []
for i in range(7):
url = "https://nl.indeed.com/vacatures?q=junior+ux+designer&l=Nederland&start={}".format(
10 * i
)
print("Getting page", i)
page = requests.get(url)
html = BeautifulSoup(page.content, "html.parser")
job_title = html.find_all("table", class_="jobCard_mainContent")
for item in job_title:
title = item.find("h2").get_text()
company = item.find("span", class_="companyName").get_text()
location = item.find("div", class_="companyLocation").get_text()
if item.find("div", class_="salary-snippet") != None:
salary = item.find(
"div", class_="heading6 tapItem-gutter metadataContainer"
).get_text()
else:
salary = "No salary found"
vacancy = {
"title": title,
"company": company,
"location": location,
"salary": salary,
}
jobs_NL.append(vacancy)
df = pd.DataFrame(jobs_NL)
print(df)
印刷:
...
90 UX Designer | SaaS Platform StarApple Amersfoort €3.000 - €4.500 per maand
91 Frontend Developer JustBetter Alkmaar No salary found
92 Software Engineer Infinitas Learning Thuiswerken No salary found
93 UX Researcher Cognizant Technology Solutions Amsterdam No salary found
94 Junior Front End developer StarApple Zeist+1 plaats €2.500 - €3.000 per maand
95 nieuwSenior User Experience Designer Trimble Bodegraven No salary found
96 Senior UX Designer - Research Agency Found Professionals B.V. Amsterdam+1 plaats No salary found
97 HubSpot marketing lead Comaxx Waalre No salary found
98 nieuwJunior Technisch CRO Specialist Finest People Amsterdam West €50.000 per jaar
99 iOS developer Infoplaza Houten No salary found
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.