简体   繁体   中英

Python - Web scraping <table>

I am trying to write a Python script that extracts data from my schools (please insert correct english word for a schedule that shows the absence of teachers) which looks like this (I tried to simplify it as much as I could):

<table class="mon_list">
  <tr class='list odd'><td class="list inline_header" colspan="8" >Name of the school class</td></tr>
  <tr class='list even'><td>Missing teacher</td><td>Substitute teacher</td><td>something</td></tr>
  <tr class='list odd'><td>Missing teacher</td><td>Substitute teacher</td><td>something</td></tr>

  <tr class='list even'><td class="list inline_header" colspan="8" >Name of the school class</td></tr>
  <tr class='list odd'><td>Missing teacher</td><td>Substitute teacher</td><td>something</td></tr>
  ...
</table>

(the pattern repeats itself for all school classes with a changed schedule)

Link to a cutout of the actual website: https://drive.google.com/file/d/16ZMnTbG6gRo-pGwrvmLSOGxJvedHeNT6/view?usp=sharing

I want all the data in the relevant rows after the name of my class.

I came as far as a loop which iterates through all the <tr> elements and checks if their content matches a specified string (the name of my school class) but that doesn't help in this case because the relevant rows are not child objects of it.

The problem is that it's just one big <table> where all the schedules for all classes with a change in their schedules are listed.

Use pandas library for this, This code will work just fine by giving you all the tables in an html string:

import pandas as pd
raw=pd.read_html("html string goes here")

you will get all the tables from the html and can access it through:

first_table=raw[0]
second_table=raw[1]

and so on depending the number of tables in the html ofcourse.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM