[英]Scrape website data for CSV
Rather inexperienced with this type of programming effort, much more familiar with embedded systems. 这类编程工作经验不足,对嵌入式系统更为熟悉。 I have very little web programming xp.
我的Web编程XP很少。
What I'd like to achieve: 我想要达到的目标:
A website (danglefactory.com) has a great table of statistics that I'd like to download into a CSV for processing. 网站(danglefactory.com)拥有大量统计信息,我希望将其下载到CSV中进行处理。 On the website, there is a button that calls an internal script to craft a cvs and prepare for download.
在网站上,有一个按钮,该按钮调用内部脚本制作cvs并准备下载。
Referer http://www.danglefactory.com/projections/skaters/daily 引荐http://www.danglefactory.com/projections/skaters/daily
Script http://www.danglefactory.com/scripts/copy_csv_xls.swf 脚本http://www.danglefactory.com/scripts/copy_csv_xls.swf
I prefer a python solution, that will be able to fetch this csv either to temp or local storage for processing. 我更喜欢python解决方案,该解决方案将能够将此CSV提取到临时或本地存储中进行处理。
Thanks in adv. 感谢在广告中。
First approach you can take is pretty low-level. 您可以采用的第一种方法是非常低级的。
Under the hood, there are JSON API calls that you can simulate using, for example, requests
. 在后台,您可以使用JSON API调用进行模拟,例如
requests
。
Here is how you can get the daily projections: 您可以通过以下方式获取每日预测:
import requests
url = 'http://www.danglefactory.com/api/DailySkaterProjections?_=1415200157912'
response = requests.get(url)
data = response.json()
print data
Prints: 打印:
[{u'A': 0.61,
u'Blocks': 0.37,
u'Corsi': 0.53,
u'FOL': 9.07,
u'FOW': 8.95,
u'FOWinPerc': 49.6,
u'G': 0.39,
u'Giveaways': 0.89,
u'Hits': 0.54,
u'Name': u'John Tavares',
u'Opponent': u'ANA',
u'P': 0.99,
u'PIM': 0.51,
u'PPA': 0.24,
u'PPG': 0.11,
u'PlayerID': 411,
u'PlusMinus': 0.05,
u'PrimaryPosition': u'C',
u'SHA': 0.0,
u'SHG': 0.0,
u'ShPerc': 12.6,
u'Shots': 3.1,
u'TOI': 20.39,
u'Takeaways': 0.82,
u'Team': u'NYI'},
{u'A': 0.7,
u'Blocks': 1.0,
u'Corsi': 0.47,
u'FOL': 8.69,
u'FOW': 8.43,
u'FOWinPerc': 49.3,
u'G': 0.28,
u'Giveaways': 0.84,
u'Hits': 1.49,
u'Name': u'Ryan Getzlaf',
u'Opponent': u'NYI',
u'P': 0.97,
u'PIM': 0.68,
u'PPA': 0.22,
u'PPG': 0.07,
u'PlayerID': 161,
u'PlusMinus': 0.06,
u'PrimaryPosition': u'C',
u'SHA': 0.04,
u'SHG': 0.02,
u'ShPerc': 11.9,
u'Shots': 2.3,
u'TOI': 20.52,
u'Takeaways': 0.61,
u'Team': u'ANA'},
...
}]
Then, you can convert the results into csv accordingly using csv
module. 然后,您可以使用
csv
模块将结果相应地转换为csv。
Another solution could be to use selenium
browser automation tool, but the problem is that the CSV
button and the table is inside a Flash object which selenium
cannot interact with . 另一个解决方案可能是使用
selenium
浏览器自动化工具,但是问题是CSV
按钮和表格位于selenium
无法与进行交互的Flash对象中。
You can though use an image recognition and screen automation tool like sikuli
to find that CSV
button and click on it. 但是,您可以使用
sikuli
等图像识别和屏幕自动化工具来找到CSV
按钮并单击它。 This is if you still want to stay on the "high-level". 这是如果您仍然希望停留在“高级”上。
Hope that helps. 希望能有所帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.