I am trying to scrape the individual batted ball data from individual URLs, here is an example ( https://baseballsavant.mlb.com/savant-player/willson-contreras-575929?stats=gamelogs-r-hitting-statcast&season=2020 )
It seems to hide the data or I cant get it by using
driver = webdriver.Chrome('/Users/gru/Documents/chromedriver')
driver.get('https://baseballsavant.mlb.com/savant-player/willson-contreras-575929?stats=gamelogs-r-hitting-statcast&season=2020')
html_page = driver.page_source
time.sleep(15)
soup = BeautifulSoup(html_page, 'lxml')
for j in soup.find_all('tr'):
drounders=[]
for h in j.find_all('td'):
drounders.append(h.get_text())
print(drounders)
Here are the first few expected rows
Game Date Bat Team Fld Team Pitcher Result EV (MPH) LA (°) Dist (ft) Direction Pitch (MPH) Pitch Type
1 2020-08-12 Carrasco, Carlos strikeout
2 2020-08-12 Carrasco, Carlos strikeout
3 2020-08-12 Carrasco, Carlos force_out Opposite
4 2020-08-11 Allen, Logan force_out 107.8 -25 5 Pull 94.0 4-Seam Fastball
5 2020-08-11 Allen, Logan strikeout 77.3 Curveball
6 2020-08-11 Hill, Cam sac_fly 100.5 42 345 Straightaway 91.6 4-Seam Fastball
The only problem I see here is Bat Team column because the column contains image not text, In this answer I have scraped the link of image from Bat Team column and that column I have added at last position and if you want to ignore then remove img
from for loop
Code:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
site = 'https://baseballsavant.mlb.com/savant-player/willson-contreras-575929?stats=gamelogs-r-hitting-statcast&season=2020'
finalData = []
driver = webdriver.Chrome(executable_path = 'chromedriver.exe') # Here I am using Chrome's web driver
#For Firefox Web driver
#driver = webdriver.Firefox(executable_path = 'geckodriver.exe')
driver.get(site)
time.sleep(10)
soup = BeautifulSoup(driver.page_source, 'html.parser')
tables = soup.find("div", id = "gamelogs_statcast")
trs = table.find_all("tr")
for trValue in trs:
data = []
txt = str(trValue.text)
img =str(trValue.find("img"))
data = txt + img
finalData.append(data)
print(finalData)
Output:
['Game DateBat TeamFld TeamPitcherResultEV (MPH)LA (°)Dist (ft)DirectionPitch (MPH)Pitch TypeNone', '1 2020-08-13 Burnes, Corbin field_out 104.1 24 400 Straightaway 95.7 4-Seam Fastball <img class="table-team-logo" src="https://www.mlbstatic.com/team-logos/112.svg"/>', '2 2020-08-13 Burnes, Corbin walk 89.2 Slider <img class="table-team-logo" src="https://www.mlbstatic.com/team-logos/112.svg"/>', '3 2020-08-13 Anderson, Brett hit_by_pitch 89.5 4-Seam Fastball <img class="table-team-logo" src="https://www.mlbstatic.com/team-logos/112.svg"/>' ........]
Hope this helps and let me know if any other help require for this answer.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.