简体   繁体   中英

Automating a button Press with Python

I've been trying for a little while to figure out how to make an automated way to download the csv on this page: https://razzball.com/mlbpitchingstats/

at the top, you can click a basic html input box and then click download on that box. I know I could figure out how to do this with a headless driver, but I have been trying to figure out how to do it with requests or somehow access the download button listener.

When monitoring the network tab, it seems there is no request to an api. Is my only option to use a headless browser? is there anyway to grab this with requests? Any help would be huge!

Unlike in your comment, the data is not populated via JS. Two clues about this:

  1. if you look at the page source, the table is already populated in your browser html.
  2. if you look at your browser network, there is no XHR request from your browser to acquire the data.

So as @SuperStew stated you could give it a try with Beautifulsoup, although it might be a little cumbersome, using a for loop over each <tr/> element.

If I needed to get that data in a usable format I would use pandas. Please see the documentation for the read_html method. This has the added bonus that it should help transform the data to the types you need, ie, integers. But as the doc states, you should expect a little bit of data wrangling.


edit :

seems like mlbstats blocks scraping through user agent filtering so you will have to use requests with spoofed user agent to get the page html:

import pandas as pd
import requests
url = "https://razzball.com/mlbpitchingstats/"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
response = requests.get(url, headers=headers)
data = pd.read_html(response.content)  # will need wrangling

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM