[英]Soccer Stats Python Scraper
I'm looking to scrape some Houston Dynamo stats from this season into a CSV and then visualize that data with R. 我希望将本赛季的休斯顿迪纳摩统计数据收集到CSV中,然后使用R可视化该数据。
How can I scrape both the tr and td elements using lxml? 如何使用lxml抓取tr和td元素? Is there an easier selector I should be looking at?
我应该看一个更简单的选择器吗?
For (reasonably) well formed HTML tables, the XML package in R makes this sort of thing pretty stupidly easy: 对于(合理)格式正确的HTML表,R中的XML包使这种事情变得非常简单:
library(XML)
> url <- "http://www.houstondynamo.com/stats/season?page=0"
> tbl <- readHTMLTable(url)
> head(tbl[[1]])
Player POS GP GS MINS G A SHTS SOG GWG PKG/A HmG RdG G/90min SC%
1 Will Bruin F 32 31 2510 12 4 78 35 0 0/0 6 6 0.43 15.4
2 Brad Davis M 31 28 2523 8 12 53 22 3 3/4 5 3 0.29 15.1
3 Brian Ching F 30 13 1385 5 5 35 15 1 2/2 2 3 0.32 14.3
4 Boniek Garcia M 17 17 1530 4 6 30 12 1 0/0 3 1 0.24 13.3
5 Calen Carr M 26 17 1512 4 2 29 11 2 0/0 3 1 0.24 13.8
6 Macoumba Kandji F 29 21 1630 4 2 34 16 1 0/0 3 1 0.22 11.8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.