簡體   English   中英

Web 從多個頁面抓取信息到 pandas dataframe

[英]Web scraping information from multiple pages into a pandas dataframe

我想編寫一些代碼,從工作列表網站的多個頁面中抓取數據。 然而,目前,當我運行我的代碼時,我只得到最后一頁,而不是我抓取的所有頁面的列表。

這是我的代碼

url = 'https://ng.indeed.com/jobs?q=Business+Intelligence+Analyst&l=Nigeria&start='
for i in range(0,80,10):
    page = requests.get(url+str(i))
    soup = BeautifulSoup(page.text, 'html.parser')
    jobs = []
    for div in soup.find_all(name='div',attrs={'class':'row'}):
        for a in div.find_all(name='a', attrs={'data-tn-element':'jobTitle'}):
            jobs.append(a['title'])
    summaries = []
    divs = soup.findAll('div', attrs={'class':'summary'})
    for d in divs:
        summaries.append(d.text.strip())
jobs = pd.DataFrame(
    {'title': extract_title(soup),
     'summary': extract_summary(soup)
    })
jobs

我使用第一個 for 循環遍歷每一頁(第 2 頁 = 10、3 = 20 等)。 理想的 output 是一個數據框,其中包含所有職位名稱和每個職位摘要的列表。 但是我只得到一個 dataframe 和最后一頁的作業

import requests
from bs4 import BeautifulSoup


summaries = []   # <-- outside of the loop
jobs = []        # <-- outside of the loop

url = 'https://ng.indeed.com/jobs?q=Business+Intelligence+Analyst&l=Nigeria&start='

for i in range(0,80,10):
    page = requests.get(url+str(i))
    soup = BeautifulSoup(page.text, 'html.parser')
    for div in soup.find_all(name='div',attrs={'class':'row'}):
        for a in div.find_all(name='a', attrs={'data-tn-element':'jobTitle'}):
            jobs.append(a['title'])
    divs = soup.findAll('div', attrs={'class':'summary'})
    for d in divs:
        summaries.append(d.text.strip())

jobs = pd.DataFrame({'title': jobs,    # <--- put only jobs here
     'summary': summaries})            # <--- put only summaries here

print(jobs)

印刷:

                                              title                                            summary
0      Analyst, Customer Intelligence (Supervisory)  Provide Intelligence To Support Business Plann...
1                     Business Intelligence Analyst  Demonstrable work experience in business intel...
2                    Manager, Business Intelligence  Provide Business Intelligence Services For CEO...
3           MTNN Need Digital Communication Analyst  Work With Individual Units (Corporate Communic...
4   MARKET RESEARCH & BUSINESS INTELLIGENCE OFFICER  Implement the overall analytics and business i...
..                                              ...                                                ...
80                  Research Analyst and Associates  Experience of Business Intelligence tools.\nIn...
81                                Financial Analyst  Perform market research, data mining, business...
82       Oracle E-business Suite Developer (Fusion)  Work Directly with Business user as an oracle ...
83                          Junior Oracle Developer  Work Directly with Business user as an oracle ...
84                 Credit Analyst at CARS45 Limited  High business research skills acumen.\nUnderst...

[85 rows x 2 columns]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM