简体   繁体   中英

Web scraping information from multiple pages into a pandas dataframe

I would like to write some code that scrapes data from multiple pages in a job listing site. Currently however, when I run my code I only get the last page as opposed to a listing of all the pages I scraped.

This is my code

url = 'https://ng.indeed.com/jobs?q=Business+Intelligence+Analyst&l=Nigeria&start='
for i in range(0,80,10):
    page = requests.get(url+str(i))
    soup = BeautifulSoup(page.text, 'html.parser')
    jobs = []
    for div in soup.find_all(name='div',attrs={'class':'row'}):
        for a in div.find_all(name='a', attrs={'data-tn-element':'jobTitle'}):
            jobs.append(a['title'])
    summaries = []
    divs = soup.findAll('div', attrs={'class':'summary'})
    for d in divs:
        summaries.append(d.text.strip())
jobs = pd.DataFrame(
    {'title': extract_title(soup),
     'summary': extract_summary(soup)
    })
jobs

I use the first for loop to iterate through each page (page 2 = 10, 3=20 etc). The ideal output is a data frame with a list of all the job titles and summary for each job. However I only get a dataframe with the jobs from the last page

import requests
from bs4 import BeautifulSoup


summaries = []   # <-- outside of the loop
jobs = []        # <-- outside of the loop

url = 'https://ng.indeed.com/jobs?q=Business+Intelligence+Analyst&l=Nigeria&start='

for i in range(0,80,10):
    page = requests.get(url+str(i))
    soup = BeautifulSoup(page.text, 'html.parser')
    for div in soup.find_all(name='div',attrs={'class':'row'}):
        for a in div.find_all(name='a', attrs={'data-tn-element':'jobTitle'}):
            jobs.append(a['title'])
    divs = soup.findAll('div', attrs={'class':'summary'})
    for d in divs:
        summaries.append(d.text.strip())

jobs = pd.DataFrame({'title': jobs,    # <--- put only jobs here
     'summary': summaries})            # <--- put only summaries here

print(jobs)

Prints:

                                              title                                            summary
0      Analyst, Customer Intelligence (Supervisory)  Provide Intelligence To Support Business Plann...
1                     Business Intelligence Analyst  Demonstrable work experience in business intel...
2                    Manager, Business Intelligence  Provide Business Intelligence Services For CEO...
3           MTNN Need Digital Communication Analyst  Work With Individual Units (Corporate Communic...
4   MARKET RESEARCH & BUSINESS INTELLIGENCE OFFICER  Implement the overall analytics and business i...
..                                              ...                                                ...
80                  Research Analyst and Associates  Experience of Business Intelligence tools.\nIn...
81                                Financial Analyst  Perform market research, data mining, business...
82       Oracle E-business Suite Developer (Fusion)  Work Directly with Business user as an oracle ...
83                          Junior Oracle Developer  Work Directly with Business user as an oracle ...
84                 Credit Analyst at CARS45 Limited  High business research skills acumen.\nUnderst...

[85 rows x 2 columns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM