[英]Correct syntax for xpathSApply in R
I'm struggling to get the statistics table on a website in a dataframe to do analysis on it. 我正在努力在数据框中的网站上获取统计表以对其进行分析。 The table an be found here: http://nl.soccerway.com/teams/netherlands/afc-ajax/1515/squad/ 该表格可在此处找到: http : //nl.soccerway.com/teams/netherlands/afc-ajax/1515/squad/
My code so far: 我的代码到目前为止:
library(XML)
url <- "http://nl.soccerway.com/teams/netherlands/afc-ajax/1515/squad/"
doc <- htmlParse(url)
xpathSApply(doc, "//tr[@*]/td/child::node()", xmlValue)
But this returns the data in an unworkable form. 但这会以不可行的形式返回数据。 What is the correct xpathSApply code? 什么是正确的xpathSApply代码?
The table with the data has id='page_team_1_block_team_squad_3-table'
you can use this in an xpath. 具有数据的表具有id='page_team_1_block_team_squad_3-table'
您可以在xpath中使用它。 An xpath "//table[@id='page_team_1_block_team_squad_3-table']/tbody"
will find the table with that id and return the table body. xpath "//table[@id='page_team_1_block_team_squad_3-table']/tbody"
将找到具有该id的表并返回表体。 You can then use readHTMLTable
with argument header = FALSE
to return the data 然后,您可以使用readHTMLTable
和参数header = FALSE
来返回数据
library(XML)
url <- "http://nl.soccerway.com/teams/netherlands/afc-ajax/1515/squad/"
doc <- htmlParse(url)
res <- readHTMLTable(doc["//table[@id='page_team_1_block_team_squad_3-table']/tbody"][[1]], header = FALSE)
head(res)
> head(res)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
1 1 K. Vermeer 28 K 856 10 10 0 1 24 0 0 0 0
2 22 J. Cillessen 25 K 2204 25 24 1 0 8 0 0 0 0
3 30 M. van der Hart 20 K 0 0 0 0 0 2 0 0 0 0
4 2 R. van Rhijn 23 V 2786 32 31 1 1 1 2 3 6 0
5 3 T. Alderweireld 25 V 360 4 4 0 0 0 0 0 0 0
6 4 N. Moisander 28 V 1985 23 22 1 0 3 1 2 0 0
V17
1 0
2 0
3 0
4 1
5 0
6 0
You don't need xpathSapply
. 您不需要xpathSapply
。 This one-liner can do it given the url: 给出网址,这个单行可以做到:
readHTMLTable(url, header = "")[[1]]
giving: 赠送:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17
1 1 K. Vermeer 28 K 856 10 10 0 1 24 0 0 0 0 0
2 22 J. Cillessen 25 K 2204 25 24 1 0 8 0 0 0 0 0
3 30 M. van der Hart 20 K 0 0 0 0 0 2 0 0 0 0 0
4 2 R. van Rhijn 23 V 2786 32 31 1 1 1 2 3 6 0 1
5 3 T. Alderweireld 25 V 360 4 4 0 0 0 0 0 0 0 0
6 4 N. Moisander 28 V 1985 23 22 1 0 3 1 2 0 0 0
7 6 M. van der Hoorn 21 V 166 3 2 1 1 21 0 0 0 0 0
8 12 J. Veltman 22 V 2158 25 24 1 1 2 2 2 2 0 0
9 15 N. Boilesen 22 V 1445 20 17 3 6 6 1 2 3 0 0
10 17 D. Blind 24 V 2531 29 29 0 5 3 1 1 4 0 0
11 24 S. Denswil 21 V 1350 17 15 2 1 14 1 0 1 0 0
12 27 R. Ligeon 22 V 350 5 4 1 3 8 0 1 0 0 0
13 42 J. Riedewald 17 V 222 5 3 2 3 10 2 0 1 0 0
14 44 K. Tete 18 V 0 0 0 0 0 1 0 0 0 0 0
15 5 C. Poulsen 34 M 1523 29 14 15 3 20 1 3 2 0 0
16 8 L. Duarte 23 M 655 14 6 8 2 14 3 0 1 0 0
17 8 C. Eriksen 22 M 360 4 4 0 0 0 2 3 1 0 0
18 10 S. de Jong 25 M 1257 19 16 3 8 3 7 1 1 0 0
19 18 D. Klaassen 21 M 2102 26 23 3 2 5 10 3 1 0 0
20 20 L. Schöne 28 M 2149 29 25 4 6 6 9 8 1 0 0
21 25 T. Serero 24 M 2276 29 25 4 6 6 3 3 3 0 0
22 34 L. de Sa 21 M 512 12 5 7 5 12 1 1 1 0 0
23 7 V. Fischer 20 A 1636 24 19 5 6 6 3 2 1 0 0
24 9 K. Sigþórsson 24 A 1928 30 20 10 16 11 10 2 0 0 0
25 11 Bojan 23 A 1357 24 17 7 12 11 4 3 2 0 0
26 16 L. Andersen 19 A 405 9 4 5 3 14 0 0 0 0 0
27 19 T. Sana 24 A 223 4 2 2 1 7 0 0 0 0 0
28 23 D. Hoesen 23 A 450 14 4 10 2 15 2 1 0 0 0
29 43 R. Kishna 19 A 389 8 5 3 5 5 1 2 0 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.