简体   繁体   中英

Scrape website data for CSV

Rather inexperienced with this type of programming effort, much more familiar with embedded systems. I have very little web programming xp.

What I'd like to achieve:

A website (danglefactory.com) has a great table of statistics that I'd like to download into a CSV for processing. On the website, there is a button that calls an internal script to craft a cvs and prepare for download.

Referer http://www.danglefactory.com/projections/skaters/daily

Script http://www.danglefactory.com/scripts/copy_csv_xls.swf

I prefer a python solution, that will be able to fetch this csv either to temp or local storage for processing.

Thanks in adv.

First approach you can take is pretty low-level.

Under the hood, there are JSON API calls that you can simulate using, for example, requests .

Here is how you can get the daily projections:

import requests

url = 'http://www.danglefactory.com/api/DailySkaterProjections?_=1415200157912'
response = requests.get(url)

data = response.json()
print data

Prints:

[{u'A': 0.61,
  u'Blocks': 0.37,
  u'Corsi': 0.53,
  u'FOL': 9.07,
  u'FOW': 8.95,
  u'FOWinPerc': 49.6,
  u'G': 0.39,
  u'Giveaways': 0.89,
  u'Hits': 0.54,
  u'Name': u'John Tavares',
  u'Opponent': u'ANA',
  u'P': 0.99,
  u'PIM': 0.51,
  u'PPA': 0.24,
  u'PPG': 0.11,
  u'PlayerID': 411,
  u'PlusMinus': 0.05,
  u'PrimaryPosition': u'C',
  u'SHA': 0.0,
  u'SHG': 0.0,
  u'ShPerc': 12.6,
  u'Shots': 3.1,
  u'TOI': 20.39,
  u'Takeaways': 0.82,
  u'Team': u'NYI'},
 {u'A': 0.7,
  u'Blocks': 1.0,
  u'Corsi': 0.47,
  u'FOL': 8.69,
  u'FOW': 8.43,
  u'FOWinPerc': 49.3,
  u'G': 0.28,
  u'Giveaways': 0.84,
  u'Hits': 1.49,
  u'Name': u'Ryan Getzlaf',
  u'Opponent': u'NYI',
  u'P': 0.97,
  u'PIM': 0.68,
  u'PPA': 0.22,
  u'PPG': 0.07,
  u'PlayerID': 161,
  u'PlusMinus': 0.06,
  u'PrimaryPosition': u'C',
  u'SHA': 0.04,
  u'SHG': 0.02,
  u'ShPerc': 11.9,
  u'Shots': 2.3,
  u'TOI': 20.52,
  u'Takeaways': 0.61,
  u'Team': u'ANA'},

  ...

}]

Then, you can convert the results into csv accordingly using csv module.


Another solution could be to use selenium browser automation tool, but the problem is that the CSV button and the table is inside a Flash object which selenium cannot interact with .


You can though use an image recognition and screen automation tool like sikuli to find that CSV button and click on it. This is if you still want to stay on the "high-level".

Hope that helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM