[英]web scraping table from multiple pages from a search and creating a pandas dataframe
[英]Web scraping information from multiple pages into a pandas dataframe
我想編寫一些代碼,從工作列表網站的多個頁面中抓取數據。 然而,目前,當我運行我的代碼時,我只得到最后一頁,而不是我抓取的所有頁面的列表。
這是我的代碼
url = 'https://ng.indeed.com/jobs?q=Business+Intelligence+Analyst&l=Nigeria&start='
for i in range(0,80,10):
page = requests.get(url+str(i))
soup = BeautifulSoup(page.text, 'html.parser')
jobs = []
for div in soup.find_all(name='div',attrs={'class':'row'}):
for a in div.find_all(name='a', attrs={'data-tn-element':'jobTitle'}):
jobs.append(a['title'])
summaries = []
divs = soup.findAll('div', attrs={'class':'summary'})
for d in divs:
summaries.append(d.text.strip())
jobs = pd.DataFrame(
{'title': extract_title(soup),
'summary': extract_summary(soup)
})
jobs
我使用第一個 for 循環遍歷每一頁(第 2 頁 = 10、3 = 20 等)。 理想的 output 是一個數據框,其中包含所有職位名稱和每個職位摘要的列表。 但是我只得到一個 dataframe 和最后一頁的作業
import requests
from bs4 import BeautifulSoup
summaries = [] # <-- outside of the loop
jobs = [] # <-- outside of the loop
url = 'https://ng.indeed.com/jobs?q=Business+Intelligence+Analyst&l=Nigeria&start='
for i in range(0,80,10):
page = requests.get(url+str(i))
soup = BeautifulSoup(page.text, 'html.parser')
for div in soup.find_all(name='div',attrs={'class':'row'}):
for a in div.find_all(name='a', attrs={'data-tn-element':'jobTitle'}):
jobs.append(a['title'])
divs = soup.findAll('div', attrs={'class':'summary'})
for d in divs:
summaries.append(d.text.strip())
jobs = pd.DataFrame({'title': jobs, # <--- put only jobs here
'summary': summaries}) # <--- put only summaries here
print(jobs)
印刷:
title summary
0 Analyst, Customer Intelligence (Supervisory) Provide Intelligence To Support Business Plann...
1 Business Intelligence Analyst Demonstrable work experience in business intel...
2 Manager, Business Intelligence Provide Business Intelligence Services For CEO...
3 MTNN Need Digital Communication Analyst Work With Individual Units (Corporate Communic...
4 MARKET RESEARCH & BUSINESS INTELLIGENCE OFFICER Implement the overall analytics and business i...
.. ... ...
80 Research Analyst and Associates Experience of Business Intelligence tools.\nIn...
81 Financial Analyst Perform market research, data mining, business...
82 Oracle E-business Suite Developer (Fusion) Work Directly with Business user as an oracle ...
83 Junior Oracle Developer Work Directly with Business user as an oracle ...
84 Credit Analyst at CARS45 Limited High business research skills acumen.\nUnderst...
[85 rows x 2 columns]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.