简体   繁体   English

抓取CSV网站数据

[英]Scrape website data for CSV

Rather inexperienced with this type of programming effort, much more familiar with embedded systems. 这类编程工作经验不足,对嵌入式系统更为熟悉。 I have very little web programming xp. 我的Web编程XP很少。

What I'd like to achieve: 我想要达到的目标:

A website (danglefactory.com) has a great table of statistics that I'd like to download into a CSV for processing. 网站(danglefactory.com)拥有大量统计信息,我希望将其下载到CSV中进行处理。 On the website, there is a button that calls an internal script to craft a cvs and prepare for download. 在网站上,有一个按钮,该按钮调用内部脚本制作cvs并准备下载。

Referer http://www.danglefactory.com/projections/skaters/daily 引荐http://www.danglefactory.com/projections/skaters/daily

Script http://www.danglefactory.com/scripts/copy_csv_xls.swf 脚本http://www.danglefactory.com/scripts/copy_csv_xls.swf

I prefer a python solution, that will be able to fetch this csv either to temp or local storage for processing. 我更喜欢python解决方案,该解决方案将能够将此CSV提取到临时或本地存储中进行处理。

Thanks in adv. 感谢在广告中。

First approach you can take is pretty low-level. 您可以采用的第一种方法是非常低级的。

Under the hood, there are JSON API calls that you can simulate using, for example, requests . 在后台,您可以使用JSON API调用进行模拟,例如requests

Here is how you can get the daily projections: 您可以通过以下方式获取每日预测:

import requests

url = 'http://www.danglefactory.com/api/DailySkaterProjections?_=1415200157912'
response = requests.get(url)

data = response.json()
print data

Prints: 打印:

[{u'A': 0.61,
  u'Blocks': 0.37,
  u'Corsi': 0.53,
  u'FOL': 9.07,
  u'FOW': 8.95,
  u'FOWinPerc': 49.6,
  u'G': 0.39,
  u'Giveaways': 0.89,
  u'Hits': 0.54,
  u'Name': u'John Tavares',
  u'Opponent': u'ANA',
  u'P': 0.99,
  u'PIM': 0.51,
  u'PPA': 0.24,
  u'PPG': 0.11,
  u'PlayerID': 411,
  u'PlusMinus': 0.05,
  u'PrimaryPosition': u'C',
  u'SHA': 0.0,
  u'SHG': 0.0,
  u'ShPerc': 12.6,
  u'Shots': 3.1,
  u'TOI': 20.39,
  u'Takeaways': 0.82,
  u'Team': u'NYI'},
 {u'A': 0.7,
  u'Blocks': 1.0,
  u'Corsi': 0.47,
  u'FOL': 8.69,
  u'FOW': 8.43,
  u'FOWinPerc': 49.3,
  u'G': 0.28,
  u'Giveaways': 0.84,
  u'Hits': 1.49,
  u'Name': u'Ryan Getzlaf',
  u'Opponent': u'NYI',
  u'P': 0.97,
  u'PIM': 0.68,
  u'PPA': 0.22,
  u'PPG': 0.07,
  u'PlayerID': 161,
  u'PlusMinus': 0.06,
  u'PrimaryPosition': u'C',
  u'SHA': 0.04,
  u'SHG': 0.02,
  u'ShPerc': 11.9,
  u'Shots': 2.3,
  u'TOI': 20.52,
  u'Takeaways': 0.61,
  u'Team': u'ANA'},

  ...

}]

Then, you can convert the results into csv accordingly using csv module. 然后,您可以使用csv模块将结果相应地转换为csv。


Another solution could be to use selenium browser automation tool, but the problem is that the CSV button and the table is inside a Flash object which selenium cannot interact with . 另一个解决方案可能是使用selenium浏览器自动化工具,但是问题是CSV按钮和表格位于selenium无法与进行交互的Flash对象中。


You can though use an image recognition and screen automation tool like sikuli to find that CSV button and click on it. 但是,您可以使用sikuli等图像识别和屏幕自动化工具来找到CSV按钮并单击它。 This is if you still want to stay on the "high-level". 这是如果您仍然希望停留在“高级”上。

Hope that helps. 希望能有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM