簡體   English   中英

網絡抓取返回無列表

[英]web scraping returns list of None

import requests
from bs4 import BeautifulSoup
import csv
from itertools import zip_longest

job_title = []
company_name = []
location_name = []
job_skill = []
links = []
salary = []


result = requests.get("https://wuzzuf.net/search/jobs/?q=python%5C&a=hpb")
source = result.content
soup = BeautifulSoup(source, "lxml")

job_titles = soup.find_all("h2", {"class": "css-m604qf"})
company_names = soup.find_all("a", {"class": "css-17s97q8"})
location_names = soup.find_all("span", {"class": "css-5wys0k"})
job_skills = soup.find_all("div", {"class": "css-y4udm8"})

for i in range(len(job_titles)):
    job_title.append(job_titles[i].text)
    links.append("https://wuzzuf.net" + job_titles[i].find("a").attrs["href"])
    company_name.append(company_names[i].text)
    location_name.append(location_names[i].text)
    job_skill.append(job_skills[i].text)

for link in links:
    result = requests.get(link)
    source = result.content
    soup = BeautifulSoup(source, "lxml")
    salaries = soup.find("span", {"class": "css-4xky9y"})
    salary.append(salaries)

file_list = [job_title, company_name, location_name, job_skill, links, salary]
exported = zip_longest(*file_list)
with open("/Users/Rich/Desktop/JobTutorial.csv", "w") as myfile:
    writer = csv.writer(myfile)
    writer.writerow(["Job titles", "Company names", "Location names", "Job skills", "Links", "Salary"])
    writer.writerows(exported)
print(salary)

問題是薪水函數什么都不返回,當我將它的結果附加到一個名為薪水的列表並打印出結果時,它打印了一個無列表...

[無,無,無,無,無,無,無,無,無,無,無,無,無,無,無,無]

請幫助我,感謝您的幫助。

工資數據是動態生成的,如果您檢查職位發布的源代碼/頁面源(Chrome 上的 ctrl+U),您可以看到數據不在 HTML 元素中。 但它可以在Wuzzuf.initialStoreState對象內的<script>標簽下找到

wuzzuf 作業詳情的源代碼

現在您必須解析此 json 文件以獲取作業詳細信息數據。 你可以使用正則表達式來做到這一點

這是一個工作代碼,用於從該列表中解析單個作業的字典 -

link = "https://wuzzuf.net/jobs/p/jITGU1cOLq2S-Senior-Python-Developer-SURE-International-Technology-Cairo-Egypt"
result = requests.get(link,  headers=headers)
raw_data = re.compile(r'Wuzzuf.initialStoreState = (.*);').search(result.text)
job_details_dict = json.loads(raw_data.group(1).strip())
job_details_dict

樣本輸出 -

{'badges': {'landingPage': {'loading': False,
   'providers': None,
   'timestamp': None}},
 'browsingPage': {'sets': {}},
 'coaches': {'coachesContactUs': {}, 'coachesPartner': {}},
.................

現在你只需要從這個字典中解析你想要的數據(例如薪水)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM