简体   繁体   中英

How to Scrape a list in Python?

I apologise in advance if my question is not written in a proper way. I've been trying to scrape https://www.bristol.ac.uk/study/undergraduate/search/search_results?sort=score&sort=UndergraduateCourse-programname&dir=desc&dir=asc&query=&filter%3AUndergraduateCourse-year=2021&page=1

I want only the text of all the course names. The code i've written only gives me the text of the first course even when i've used find-all. I think maybe this is because the courses are in a list. Here's my code:

from bs4 import BeautifulSoup
import requests

source = requests.get("https://www.bristol.ac.uk/study/undergraduate/search/search_results?sort=score&sort=UndergraduateCourse-programname&dir=desc&dir=asc&query=&filter%3AUndergraduateCourse-year=2021&page=1").text

soup = BeautifulSoup(source, 'lxml')

for div in soup.find_all('div', class_='course-results course-results-ug'):
    course = div.a.text
    print(course)

The output comes:

            Accounting and Finance
        , 
            BSc

Please help. I've begun only recently so dont know any advanced coding

You can use CSS selector .course-results-list a to get all links under the course list:

import requests
from bs4 import BeautifulSoup


source = requests.get("https://www.bristol.ac.uk/study/undergraduate/search/search_results?sort=score&sort=UndergraduateCourse-programname&dir=desc&dir=asc&query=&filter%3AUndergraduateCourse-year=2021&page=1").text
soup = BeautifulSoup(source, 'lxml')

for a in soup.select('.course-results-list a'):
    course = ', '.join(map(str.strip, a.text.split(',')))
    print(course)

Prints:

Accounting and Finance, BSc
Accounting and Finance with Professional Placement, BSc
Accounting and Finance with Study Abroad, BSc
Accounting and Management, BSc
Accounting and Management with Professional Placement, BSc
Accounting and Management with Study Abroad, BSc
Aerospace Engineering, BEng
Aerospace Engineering, MEng
Aerospace Engineering with Study Abroad, MEng
Aerospace Engineering with Study Abroad in a Modern Language, MEng

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM