简体   繁体   中英

Extract Data From Website Table Python Not Showing Rows

I used multiple ways to access table rows but i couldn't.

import pandas as pd

url = "https://programsandcourses.anu.edu.au/catalogue"

d = pd.read_html(url, header =0, flavor = 'bs4')

print(d)

And not showing rows data just shown as below:

[                  Code  ...             Delivery
0  Show all results...  ...  Show all results...

[1 rows x 7 columns],                   Code  ...             Delivery
0  Show all results...  ...  Show all results...

[1 rows x 6 columns],                   Code                Title  ...               Career Units
0  Show all results...  Show all results...  ...  Show all results...   NaN

[1 rows x 5 columns],                   Code                Title  ...               Career Units
0  Show all results...  Show all results...  ...  Show all results...   NaN

[1 rows x 5 columns],                   Code                Title  ...               Career Units
0  Show all results...  Show all results...  ...  Show all results...   NaN

[1 rows x 5 columns]]

How can i access data to store in csv file? It needs any permissions?

May be content is dynamic so its hard to fetch from pandas as well as beautifulsoup what approach you can follow

  1. Go to chrome developer mode and refresh your page and now go to the Network tab and click on xhr you will able to find links under Name tab

  2. Click on links in which first link contains only first 20 data.

  3. as you want all 416 data so go to web page click on show all result and xhr will have new link which is in code and it is type of json

  4. Click on that link and copy the link address so now you can extract what so ever data you want from json data

Code:

import requests
res=requests.get("https://programsandcourses.anu.edu.au/data/ProgramSearch/GetPrograms?q=&client=anu_frontend&proxystylesheet=anu_frontend&site=default_collection&btnG=Search&filter=0&q=&client=anu_frontend&proxystylesheet=anu_frontend&site=default_collection&btnG=Search&filter=0&AppliedFilter=FilterByPrograms&Source=&ShowAll=true&PageIndex=0&MaxPageSize=20&PageSize=Infinity&SortColumn=&SortDirection=&InitailSearchRequestedFromExternalPage=true&SearchText=&SelectedYear=2021&Careers%5B0%5D=&Careers%5B1%5D=&Careers%5B2%5D=&Careers%5B3%5D=&Sessions%5B0%5D=&Sessions%5B1%5D=&Sessions%5B2%5D=&Sessions%5B3%5D=&Sessions%5B4%5D=&Sessions%5B5%5D=&DegreeIdentifiers%5B0%5D=&DegreeIdentifiers%5B1%5D=&DegreeIdentifiers%5B2%5D=&FilterByMajors=&FilterByMinors=&FilterBySpecialisations=&CollegeName=All+Colleges&ModeOfDelivery=All+Modes")
main_json=res.json()
len(main_json['Items'])

Image:

approach of point number 3

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM