简体   繁体   中英

Web Scraping a page with multiple tables

I am trying to web scrape the second table from this website: https://fbref.com/en/comps/9/stats/Premier-League-Stats However, I have only ever managed to extract the information from the first table when trying to access the information by finding the table tag. Would anyone be able to explain to me why I cannot access the second table or show me how to do it.

import requests 
from bs4 import BeautifulSoup
url = "https://fbref.com/en/comps/9/stats/Premier-League-Stats"
res = requests.get(url)
soup = BeautifulSoup(res.text, 'lxml')
pl_table = soup.find_all("table")  
player_table = tables[0]

Something along these lines should do it

tables = soup.find_all("table")  # returns a list of tables
second_table = tables[1]

The table is inside HTML comments <.--... --> .

To get the table from comments, you can use this example:

import requests
from bs4 import BeautifulSoup, Comment


url = 'https://fbref.com/en/comps/9/stats/Premier-League-Stats'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

table = BeautifulSoup(soup.select_one('#all_stats_standard').find_next(text=lambda x: isinstance(x, Comment)), 'html.parser')

#print some information from the table to screen:
for tr in table.select('tr:has(td)'):
    tds = [td.get_text(strip=True) for td in tr.select('td')]
    print('{:<30}{:<20}{:<10}'.format(tds[0], tds[3], tds[5]))

Prints:

Patrick van Aanholt           Crystal Palace      1990      
Max Aarons                    Norwich City        2000      
Tammy Abraham                 Chelsea             1997      
Che Adams                     Southampton         1996      
Adrián                        Liverpool           1987      
Sergio Agüero                 Manchester City     1988      
Albian Ajeti                  West Ham            1997      
Nathan Aké                    Bournemouth         1995      
Marc Albrighton               Leicester City      1989      
Toby Alderweireld             Tottenham           1989      

...and so on.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM