为什么Beautifulsoup在这个网页上找不到表

Question

我正在尝试使用 Google Colab 从该网站上抓取一张表格，但是当我运行下面的代码时，我收到了空括号。

import urllib.request as url
from bs4 import BeautifulSoup

page = f'https://www.stadiumgaming.gg/rank-checker?pokemon=Walrein'
html = url.urlopen(page)
soup = BeautifulSoup(HTML,'html5lib').findAll('td')
print(soup)

Output： []

如何在此页面上找到表格，以便将其解析为 dataframe？

Answer 1

您无法 Beautifulsoup 在此网页上找到表格，因为它由JavaScript动态填充，并且 bs4 无法解析 JS。 但你可以用 selenium 模仿 bs4、pandas

import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.chrome.options import Options

webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service)
url = 'https://www.stadiumgaming.gg/rank-checker?pokemon=WALREIN'
driver.get(url)
driver.maximize_window()
time.sleep(3)

table=BeautifulSoup(driver.page_source, 'lxml')

df = pd.read_html(str(table))[1]
print(df.iloc[1:,0:9])

结果：

    Rank       IVs    CP   Lvl       %     Atk     Def  Sta    Prod
1   2072  10/10/10  1483    20   94.97  114.70  111.12  150  1911.8
2      1   0/12/15  1499    21  100.00  111.41  115.09  157  2013.1
3      2   0/13/14  1500    21   99.89  111.41  115.70  156  2010.9
4      3   0/13/13  1497    21   99.89  111.41  115.70  156  2010.9
5      4   0/14/12  1498    21   99.78  111.41  116.31  155  2008.6
6      5   1/14/10  1500    21   99.68  112.02  116.31  154  2006.6
7      6   0/15/11  1499    21   99.65  111.41  116.92  154  2006.1
8      7   0/15/10  1496    21   99.65  111.41  116.92  154  2006.1
9      8    1/15/8  1498    21   99.55  112.02  116.92  153    2004
10     9   3/15/15  1499  20.5   99.53  111.89  115.52  155  2003.5
11    10   1/10/15  1499    21   99.48  112.02  113.86  157  2002.6

为什么Beautifulsoup在这个网页上找不到表

问题描述

1 个解决方案

解决方案1
0 2022-08-21 20:56:53

为什么Beautifulsoup在这个网页上找不到表

问题描述

1 个解决方案

解决方案1 0 2022-08-21 20:56:53

解决方案1
0 2022-08-21 20:56:53