简体   繁体   English

美汤刮桌4

[英]Scraping table by beautiful soup 4

Hello I am trying to scrape this table in this url: https://www.espn.com/nfl/stats/player/_/stat/rushing/season/2018/seasontype/2/table/rushing/sort/rushingYards/dir/desc您好,我正在尝试在此 url 中刮这张表: https://www.espn.com/nfl/stats/player/_/stat/rushing/season/2018/seasontype/2/table/rushing/sort/rushingYards/目录/描述

There are 50 rows in this table.. however if you click Show more (just below the table), more of the rows appear.此表中有 50 行。但是,如果您单击Show more (就在表下方),则会显示更多行。 My beautiful soup code works fine, But the problem is it retrieves only the first 50 rows.我漂亮的汤代码工作正常,但问题是它只检索前 50 行。 It doesnot retrieve rows that appear after clicking the Show more .它不会检索单击Show more后出现的行。 How can i get all the rows including first 50 and also those appears after clicking Show more ?如何获取包括前 50 行在内的所有行以及单击Show more后出现的行? Here is the code:这是代码:

#Request to get the target wiki page
rqst = requests.get("https://www.espn.com/nfl/stats/player/_/stat/rushing/season/2018/seasontype/2/table/rushing/sort/rushingYards/dir/desc")
soup = BeautifulSoup(rqst.content,'lxml')
table = soup.find_all('table')
NFL_player_stats = pd.read_html(str(table))
players = NFL_player_stats[0]
players.shape
out[0]:  (50,1) 

Using DevTools in Firefox I see it gets data (in JSON format) for next page fromFirefox DevTools看到它从下一页获取数据(以 JSON 格式)

https://site.web.api.espn.com/apis/common/v3/sports/football/nfl/statistics/byathlete?region=us&lang=en&contentorigin=espn&isqualified=false&limit=50&category=offense%3Arushing&sort=rushing.rushingYards%3Adesc&season=2018&seasontype=2&page=2 https://site.web.api.espn.com/apis/common/v3/sports/football/nfl/statistics/byathlete?region=us&lang=en&contentorigin=espn&isqualified=false&limit=50&category=offense%3Arushing&sort=rushing.rushingYards% 3Adesc&season=2018&seasontype=2&page=2

If you change value in page= then you can get other pages.如果您更改page=中的值,那么您可以获得其他页面。

import requests

url = 'https://site.web.api.espn.com/apis/common/v3/sports/football/nfl/statistics/byathlete?region=us&lang=en&contentorigin=espn&isqualified=false&limit=50&category=offense%3Arushing&sort=rushing.rushingYards%3Adesc&season=2018&seasontype=2&page='

for page in range(1, 4):
    print('\n---', page, '---\n')

    r = requests.get(url + str(page))
    data = r.json()

    #print(data.keys())

    for item in data['athletes']:
        print(item['athlete']['displayName'])

Result:结果:

--- 1 ---

Ezekiel Elliott
Saquon Barkley
Todd Gurley II
Joe Mixon
Chris Carson
Christian McCaffrey
Derrick Henry
Adrian Peterson
Phillip Lindsay
Nick Chubb
Lamar Miller
James Conner
David Johnson
Jordan Howard
Sony Michel
Marlon Mack
Melvin Gordon
Alvin Kamara
Peyton Barber
Kareem Hunt
Matt Breida
Tevin Coleman
Aaron Jones
Doug Martin
Frank Gore
Gus Edwards
Lamar Jackson
Isaiah Crowell
Mark Ingram II
Kerryon Johnson
Josh Allen
Dalvin Cook
Latavius Murray
Carlos Hyde
Austin Ekeler
Deshaun Watson
Kenyan Drake
Royce Freeman
Dion Lewis
LeSean McCoy
Mike Davis
Josh Adams
Alfred Blue
Cam Newton
Jamaal Williams
Tarik Cohen
Leonard Fournette
Alfred Morris
James White
Mitchell Trubisky

--- 2 ---

Rashaad Penny
LeGarrette Blount
T.J. Yeldon
Alex Collins
C.J. Anderson
Chris Ivory
Marshawn Lynch
Russell Wilson
Blake Bortles
Wendell Smallwood
Marcus Mariota
Bilal Powell
Jordan Wilkins
Kenneth Dixon
Ito Smith
Nyheim Hines
Dak Prescott
Jameis Winston
Elijah McGuire
Patrick Mahomes
Aaron Rodgers
Jeff Wilson Jr.
Zach Zenner
Raheem Mostert
Corey Clement
Jalen Richard
Damien Williams
Jaylen Samuels
Marcus Murphy
Spencer Ware
Cordarrelle Patterson
Malcolm Brown
Giovani Bernard
Chase Edmonds
Justin Jackson
Duke Johnson
Taysom Hill
Kalen Ballage
Ty Montgomery
Rex Burkhead
Jay Ajayi
Devontae Booker
Chris Thompson
Wayne Gallman
DJ Moore
Theo Riddick
Alex Smith
Robert Woods
Brian Hill
Dwayne Washington

--- 3 ---

Ryan Fitzpatrick
Tyreek Hill
Andrew Luck
Ryan Tannehill
Josh Rosen
Sam Darnold
Baker Mayfield
Jeff Driskel
Rod Smith
Matt Ryan
Tyrod Taylor
Kirk Cousins
Cody Kessler
Darren Sproles
Josh Johnson
DeAndre Washington
Trenton Cannon
Javorius Allen
Jared Goff
Julian Edelman
Jacquizz Rodgers
Kapri Bibbs
Andy Dalton
Ben Roethlisberger
Dede Westbrook
Case Keenum
Carson Wentz
Brandon Bolden
Curtis Samuel
Stevan Ridley
Keith Ford
Keenan Allen
John Kelly
Kenjon Barner
Matthew Stafford
Tyler Lockett
C.J. Beathard
Cameron Artis-Payne
Devonta Freeman
Brandin Cooks
Isaiah McKenzie
Colt McCoy
Stefon Diggs
Taylor Gabriel
Jarvis Landry
Tavon Austin
Corey Davis
Emmanuel Sanders
Sammy Watkins
Nathan Peterman

EDIT: get all data as DataFrame编辑:获取所有数据为DataFrame

import requests
import pandas as pd

url = 'https://site.web.api.espn.com/apis/common/v3/sports/football/nfl/statistics/byathlete?region=us&lang=en&contentorigin=espn&isqualified=false&limit=50&category=offense%3Arushing&sort=rushing.rushingYards%3Adesc&season=2018&seasontype=2&page='

df = pd.DataFrame() # emtpy DF at start

for page in range(1, 4):
    print('page:', page)

    r = requests.get(url + str(page))
    data = r.json()

    #print(data.keys())

    for item in data['athletes']:
        player_name = item['athlete']['displayName']
        position = item['athlete']['position']['abbreviation']
        gp = item['categories'][0]['totals'][0]
        other_values = item['categories'][2]['totals']
        row = [player_name, position, gp] + other_values

        df = df.append( [row] ) # append one row

df.columns = ['NAME', 'POS', 'GP', 'ATT', 'YDS', 'AVG', 'LNG', 'BIG', 'TD', 'YDS/G', 'FUM', 'LST', 'FD']

print(len(df)) # 150
print(df.head(20))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM