[英]Why can't Beautifulsoup find a table on this webpage
我正在尝试使用 Google Colab 从该网站上抓取一张表格,但是当我运行下面的代码时,我收到了空括号。
import urllib.request as url
from bs4 import BeautifulSoup
page = f'https://www.stadiumgaming.gg/rank-checker?pokemon=Walrein'
html = url.urlopen(page)
soup = BeautifulSoup(HTML,'html5lib').findAll('td')
print(soup)
Output: []
如何在此页面上找到表格,以便将其解析为 dataframe?
您无法 Beautifulsoup 在此网页上找到表格,因为它由JavaScript
动态填充,并且 bs4 无法解析 JS。 但你可以用 selenium 模仿 bs4、pandas
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.chrome.options import Options
webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service)
url = 'https://www.stadiumgaming.gg/rank-checker?pokemon=WALREIN'
driver.get(url)
driver.maximize_window()
time.sleep(3)
table=BeautifulSoup(driver.page_source, 'lxml')
df = pd.read_html(str(table))[1]
print(df.iloc[1:,0:9])
结果:
Rank IVs CP Lvl % Atk Def Sta Prod
1 2072 10/10/10 1483 20 94.97 114.70 111.12 150 1911.8
2 1 0/12/15 1499 21 100.00 111.41 115.09 157 2013.1
3 2 0/13/14 1500 21 99.89 111.41 115.70 156 2010.9
4 3 0/13/13 1497 21 99.89 111.41 115.70 156 2010.9
5 4 0/14/12 1498 21 99.78 111.41 116.31 155 2008.6
6 5 1/14/10 1500 21 99.68 112.02 116.31 154 2006.6
7 6 0/15/11 1499 21 99.65 111.41 116.92 154 2006.1
8 7 0/15/10 1496 21 99.65 111.41 116.92 154 2006.1
9 8 1/15/8 1498 21 99.55 112.02 116.92 153 2004
10 9 3/15/15 1499 20.5 99.53 111.89 115.52 155 2003.5
11 10 1/10/15 1499 21 99.48 112.02 113.86 157 2002.6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.