简体   繁体   English

从网站表 Python 中提取数据不显示行

[英]Extract Data From Website Table Python Not Showing Rows

I used multiple ways to access table rows but i couldn't.我使用了多种方法来访问表格行,但我做不到。

import pandas as pd

url = "https://programsandcourses.anu.edu.au/catalogue"

d = pd.read_html(url, header =0, flavor = 'bs4')

print(d)

And not showing rows data just shown as below:并且不显示行数据,如下所示:

[                  Code  ...             Delivery
0  Show all results...  ...  Show all results...

[1 rows x 7 columns],                   Code  ...             Delivery
0  Show all results...  ...  Show all results...

[1 rows x 6 columns],                   Code                Title  ...               Career Units
0  Show all results...  Show all results...  ...  Show all results...   NaN

[1 rows x 5 columns],                   Code                Title  ...               Career Units
0  Show all results...  Show all results...  ...  Show all results...   NaN

[1 rows x 5 columns],                   Code                Title  ...               Career Units
0  Show all results...  Show all results...  ...  Show all results...   NaN

[1 rows x 5 columns]]

How can i access data to store in csv file?如何访问数据以存储在 csv 文件中? It needs any permissions?它需要任何权限吗?

May be content is dynamic so its hard to fetch from pandas as well as beautifulsoup what approach you can follow可能内容是动态的,因此很难从pandasbeautifulsoup获取您可以采用什么方法

  1. Go to chrome developer mode and refresh your page and now go to the Network tab and click on xhr you will able to find links under Name tab Go 到 chrome 开发者模式并刷新你的页面,现在 go 到 Network 选项卡并点击 xhr 你将能够在 Name 选项卡下找到链接

  2. Click on links in which first link contains only first 20 data.单击第一个链接仅包含前 20 个数据的链接。

  3. as you want all 416 data so go to web page click on show all result and xhr will have new link which is in code and it is type of json因为你想要所有 416 个数据,所以 go 到 web 页面单击显示所有结果,xhr 将有新的链接,它在代码中,它是 json 的类型

  4. Click on that link and copy the link address so now you can extract what so ever data you want from json data单击该链接并复制链接地址,现在您可以从 json 数据中提取您想要的任何数据

Code:代码:

import requests
res=requests.get("https://programsandcourses.anu.edu.au/data/ProgramSearch/GetPrograms?q=&client=anu_frontend&proxystylesheet=anu_frontend&site=default_collection&btnG=Search&filter=0&q=&client=anu_frontend&proxystylesheet=anu_frontend&site=default_collection&btnG=Search&filter=0&AppliedFilter=FilterByPrograms&Source=&ShowAll=true&PageIndex=0&MaxPageSize=20&PageSize=Infinity&SortColumn=&SortDirection=&InitailSearchRequestedFromExternalPage=true&SearchText=&SelectedYear=2021&Careers%5B0%5D=&Careers%5B1%5D=&Careers%5B2%5D=&Careers%5B3%5D=&Sessions%5B0%5D=&Sessions%5B1%5D=&Sessions%5B2%5D=&Sessions%5B3%5D=&Sessions%5B4%5D=&Sessions%5B5%5D=&DegreeIdentifiers%5B0%5D=&DegreeIdentifiers%5B1%5D=&DegreeIdentifiers%5B2%5D=&FilterByMajors=&FilterByMinors=&FilterBySpecialisations=&CollegeName=All+Colleges&ModeOfDelivery=All+Modes")
main_json=res.json()
len(main_json['Items'])

Image:图片:

approach of point number 3第 3 点的逼近

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM