简体   繁体   中英

Why can't Beautifulsoup find a table on this webpage

I'm trying to use Google Colab to webscrap a table from this website but when I run the code below I receive empty brackets.

import urllib.request as url
from bs4 import BeautifulSoup

page = f'https://www.stadiumgaming.gg/rank-checker?pokemon=Walrein'
html = url.urlopen(page)
soup = BeautifulSoup(HTML,'html5lib').findAll('td')
print(soup)

Output: []

How can I find the table on this page so that it can be parsed into a dataframe?

You can't Beautifulsoup find a table on this webpage because it's dinamically populated by JavaScript and bs4 can't parse JS. but you can mimic bs4, pandas with selenium

import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.chrome.options import Options

webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service)
url = 'https://www.stadiumgaming.gg/rank-checker?pokemon=WALREIN'
driver.get(url)
driver.maximize_window()
time.sleep(3)

table=BeautifulSoup(driver.page_source, 'lxml')

df = pd.read_html(str(table))[1]
print(df.iloc[1:,0:9])

Result:

    Rank       IVs    CP   Lvl       %     Atk     Def  Sta    Prod
1   2072  10/10/10  1483    20   94.97  114.70  111.12  150  1911.8
2      1   0/12/15  1499    21  100.00  111.41  115.09  157  2013.1
3      2   0/13/14  1500    21   99.89  111.41  115.70  156  2010.9
4      3   0/13/13  1497    21   99.89  111.41  115.70  156  2010.9
5      4   0/14/12  1498    21   99.78  111.41  116.31  155  2008.6
6      5   1/14/10  1500    21   99.68  112.02  116.31  154  2006.6
7      6   0/15/11  1499    21   99.65  111.41  116.92  154  2006.1
8      7   0/15/10  1496    21   99.65  111.41  116.92  154  2006.1
9      8    1/15/8  1498    21   99.55  112.02  116.92  153    2004
10     9   3/15/15  1499  20.5   99.53  111.89  115.52  155  2003.5
11    10   1/10/15  1499    21   99.48  112.02  113.86  157  2002.6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM