足球统计Python刮板

Question

I'm looking to scrape some Houston Dynamo stats from this season into a CSV and then visualize that data with R. 我希望将本赛季的休斯顿迪纳摩统计数据收集到CSV中，然后使用R可视化该数据。

How can I scrape both the tr and td elements using lxml? 如何使用lxml抓取tr和td元素？ Is there an easier selector I should be looking at? 我应该看一个更简单的选择器吗？

Answer 1

For (reasonably) well formed HTML tables, the XML package in R makes this sort of thing pretty stupidly easy: 对于（合理）格式正确的HTML表，R中的XML包使这种事情变得非常简单：

library(XML)
> url <- "http://www.houstondynamo.com/stats/season?page=0"
> tbl <- readHTMLTable(url)
> head(tbl[[1]])
           Player POS GP GS MINS  G  A SHTS SOG GWG PKG/A HmG RdG G/90min  SC%
1      Will Bruin   F 32 31 2510 12  4   78  35   0   0/0   6   6    0.43 15.4
2      Brad Davis   M 31 28 2523  8 12   53  22   3   3/4   5   3    0.29 15.1
3     Brian Ching   F 30 13 1385  5  5   35  15   1   2/2   2   3    0.32 14.3
4   Boniek Garcia   M 17 17 1530  4  6   30  12   1   0/0   3   1    0.24 13.3
5      Calen Carr   M 26 17 1512  4  2   29  11   2   0/0   3   1    0.24 13.8
6 Macoumba Kandji   F 29 21 1630  4  2   34  16   1   0/0   3   1    0.22 11.8

足球统计Python刮板

问题描述

1 个解决方案

解决方案1
5 已采纳 2012-12-05 22:16:30

足球统计Python刮板

问题描述

1 个解决方案

解决方案1 5 已采纳 2012-12-05 22:16:30

解决方案1
5 已采纳 2012-12-05 22:16:30