简体   繁体   中英

How do I web-scrape this website using Beautiful Soup

I am trying to scrape a website to print out events with their time and date

with open('events.html', 'r',  encoding='utf-8') as html_file:
content = html_file.read()

soup = BeautifulSoup(content, 'lxml')
free_slot = soup.find_all('tr', class_='views-field views-field-title')
for slot in free_slot:
    event_name = slot.a.text
    event_time = slot.time.text

    print(event_name)
    print(event_time)

events.html contains this

Bystander Intervention: Live Workshop Glasnevin Campus Solas Room, The U Student Support & Development February 15, 13:00 - February 15, 13:50

The html is from this website : https://www.dcu.ie/students/events When I try run the code it just returns '[]'

What happens?

ResultSet is empty cause there is no <tr> with these classes defined in your find_all() .

How to fix?

Remove the classes from your find_all() and iterate over:

free_slot = soup.find_all('tr')
for slot in free_slot:
    print(slot)
    event_name = slot.a.text
    event_time = slot.time.text

    print(event_name)
    print(event_time)

How to scrape the table?

You can do it using BeautifulSoup but I think to get the contents of table it is much more simple to use pandas built-in read_html , that will do the job for you:

import pandas as pd
pd.read_html('https://www.dcu.ie/students/events')[0]

Output

Unnamed: 0 Campus Venue Department Event date
Bystander Intervention: Live Workshop Glasnevin Campus Solas Room, The U Student Support & Development February 15, 13:00 - February 15, 13:50
Emotional Intelligence: Ways to Ease Stress and Anxiety - Session 2 Online Online via Zoom Student Support & Development February 15, 13:00 - February 15, 14:00
Critical writing Online Online via Zoom Student Learning February 15, 13:00 - February 15, 14:00
Skills Session: Ace your Interview Skills Online Online Careers Service February 15, 13:00 - February 15, 13:50
Bystander Intervention: Live Workshop St Patrick's Campus B108, Auditorium Student Support & Development February 15, 17:00 - February 15, 17:50
Bystander Intervention: Live Workshop Glasnevin Campus Cuilin Room, The U Student Support & Development February 15, 18:00 - February 15, 18:50
How to Survive a Technical Interview with Microsoft Online Online Careers Service February 16, 10:00 - February 16, 11:00
Going Global Job Seach Training Session Online Virtual Careers Service February 16, 10:00 - February 16, 11:00
Informative session and a Q&A on the Vodafone Ireland Summer Internship Programme 2022. Online Online Careers Service February 16, 12:00 - February 16, 13:00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM