简体   繁体   English

使用美丽的汤丢失数据

[英]Missing data using Beautiful soup

I'm trying to get the university names, scores and country names from this website: https://roundranking.com/ranking/world-university-rankings.html#world-2021 I can find the table where the data is by class, but the data which is in the <tbody> part of table is just disappears when I try to find it with Beautiful soup.我正在尝试从以下网站获取大学名称、分数和国家/地区名称: https://roundranking.com/ranking/world-university-rankings.html#world-2021我可以找到数据来自 class 的表格,但是当我尝试用美丽的汤找到它时,表的<tbody>部分中的数据就消失了。

Here is the original html code:这是原始 html 代码:

<table class="big-table table-sortable uci" style="padding: 0px;">
<thead class="tableFloatingHeaderOriginal">
<tr><th class="td1">Rank</th><th class="td2" style="background-color: rgb(198, 235, 178);">University</th><th class="td3">Score</th><th class="td4">Country</th><th class="td6">Flag</th><th class="td7">League</th></tr>
</thead><thead class="tableFloatingHeader" style="display: none; opacity: 0;">
<tr><th class="td1">Rank</th><th class="td2" style="background-color: rgb(198, 235, 178);">University</th><th class="td3">Score</th><th class="td4">Country</th><th class="td6">Flag</th><th class="td7">League</th></tr>
</thead>
<tbody>
<tr class="az-row-100"><td class="td1">1</td><td class="td2"><a href="/universities/harvard-university.html?sort=O&amp;year=2021&amp;subject=SO">Harvard University</a></td><td class="td3">100.000</td><td class="td4">USA</td><td class="td6"><img src="../images_rur/Flag/Flag_USA.png" alt=""></td><td class="td7">Diamond League</td>
...
</tbody>
</table>

And here is the html what the soup shows:这是汤显示的 html:

<table class="big-table table-sortable uci" style="padding: 0px;">
<thead class="tableFloatingHeaderOriginal">
<tr><th class="td1">Rank</th><th class="td2" style="background-color: rgb(198, 235, 178);">University</th><th class="td3">Score</th><th class="td4">Country</th><th class="td6">Flag</th><th class="td7">League</th></tr>
</thead><thead class="tableFloatingHeader" style="display: none; opacity: 0;">
<tr><th class="td1">Rank</th><th class="td2" style="background-color: rgb(198, 235, 178);">University</th><th class="td3">Score</th><th class="td4">Country</th><th class="td6">Flag</th><th class="td7">League</th></tr>
</thead>
</table>

My python code trying to get tha data:我的 python 代码试图获取数据:

import selenium
from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome('./chromedriver.exe')
driver.get('https://roundranking.com/ranking/world-university-rankings.html#world-2021')

source = driver.page_source
soup=BeautifulSoup(source)
#soup = BeautifulSoup(source, 'html5lib')
#soup = BeautifulSoup(source, 'html.parser')
#soup = BeautifulSoup(source, 'lxml')

soup.prettify

table=soup.find('table', {'class':'big-table table-sortable uci'})
print(table)

I've tried html5lib, lxml and html.parser but nothing works, when I print out the table it does not contain the body part, which has the data I need.我尝试了 html5lib、lxml 和 html.parser 但没有任何效果,当我打印出表格时,它不包含正文部分,其中包含我需要的数据。

the table is generated by a java script, you can find the required query in the browser.该表由 java 脚本生成,您可以在浏览器中找到所需的查询。 here is an example这是一个例子

url = "https://roundranking.com/final/ranking-json18r.php"

payload = "t=2021&s=O&sa=SO&sc=All+Countries"
response = requests.request("POST", url, data=payload)
for university in response.json():
    print(university['rank'], university['univ'], university['score'], university['economy'], university['league'])

OUTPUT: OUTPUT:

1 Harvard University 100.0 USA Diamond League
2 California Institute of Technology (Caltech) 98.137 USA Diamond League
3 Imperial College London 97.706 UK Diamond League
4 Stanford University 97.604 USA Diamond League
5 Yale University 97.506 USA Diamond League
6 Massachusetts Institute of Technology (MIT) 97.364 USA Diamond League
7 ETH Zurich (Swiss Federal Institute of Technology) 96.187 Switzerland Diamond League
8 Columbia University 95.393 USA Diamond League
9 University of Cambridge 95.258 UK Diamond League
10 University of Oxford 94.989 UK Diamond League
11 University of Chicago 94.712 USA Diamond League
12 Karolinska Institute 94.642 Sweden Diamond League
13 Johns Hopkins University 94.299 USA Diamond League
14 University College London 94.172 UK Diamond League
15 Northwestern University 94.117 USA Diamond League
16 Princeton University 93.993 USA Diamond League
17 Ecole Polytechnique Federale de Lausanne 93.75 Switzerland Diamond League
18 University of Pennsylvania 93.525 USA Diamond League
19 Cornell University 92.271 USA Diamond League
20 Washington University in St. Louis 91.325 USA Diamond League
21 Carnegie Mellon University 90.608 USA Diamond League
22 Scuola Normale Superiore di Pisa 90.345 Italy Diamond League
23 Case Western Reserve University 90.314 USA Diamond League
24 University of Michigan 89.447 USA Diamond League
25 Boston University 89.443 USA Diamond League
26 Brown University 89.043 USA Diamond League
27 Technical University of Denmark 88.842 Denmark Diamond League
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM