简体   繁体   English

在R中更正xpathSApply的语法

[英]Correct syntax for xpathSApply in R

I'm struggling to get the statistics table on a website in a dataframe to do analysis on it. 我正在努力在数据框中的网站上获取统计表以对其进行分析。 The table an be found here: http://nl.soccerway.com/teams/netherlands/afc-ajax/1515/squad/ 该表格可在此处找到: http//nl.soccerway.com/teams/netherlands/afc-ajax/1515/squad/

My code so far: 我的代码到目前为止:

library(XML)
url <- "http://nl.soccerway.com/teams/netherlands/afc-ajax/1515/squad/"
doc <- htmlParse(url)
xpathSApply(doc, "//tr[@*]/td/child::node()", xmlValue)

But this returns the data in an unworkable form. 但这会以不可行的形式返回数据。 What is the correct xpathSApply code? 什么是正确的xpathSApply代码?

The table with the data has id='page_team_1_block_team_squad_3-table' you can use this in an xpath. 具有数据的表具有id='page_team_1_block_team_squad_3-table'您可以在xpath中使用它。 An xpath "//table[@id='page_team_1_block_team_squad_3-table']/tbody" will find the table with that id and return the table body. xpath "//table[@id='page_team_1_block_team_squad_3-table']/tbody"将找到具有该id的表并返回表体。 You can then use readHTMLTable with argument header = FALSE to return the data 然后,您可以使用readHTMLTable和参数header = FALSE来返回数据

library(XML)
url <- "http://nl.soccerway.com/teams/netherlands/afc-ajax/1515/squad/"
doc <- htmlParse(url)
res <- readHTMLTable(doc["//table[@id='page_team_1_block_team_squad_3-table']/tbody"][[1]], header = FALSE)
head(res)
> head(res)
V1 V2              V3 V4 V5 V6   V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
1  1         K. Vermeer    28  K  856 10 10   0   1  24   0   0   0   0
2 22       J. Cillessen    25  K 2204 25 24   1   0   8   0   0   0   0
3 30    M. van der Hart    20  K    0  0  0   0   0   2   0   0   0   0
4  2       R. van Rhijn    23  V 2786 32 31   1   1   1   2   3   6   0
5  3    T. Alderweireld    25  V  360  4  4   0   0   0   0   0   0   0
6  4       N. Moisander    28  V 1985 23 22   1   0   3   1   2   0   0
V17
1   0
2   0
3   0
4   1
5   0
6   0

You don't need xpathSapply . 您不需要xpathSapply This one-liner can do it given the url: 给出网址,这个单行可以做到:

readHTMLTable(url, header = "")[[1]]

giving: 赠送:

   V1 V2               V3 V4 V5 V6   V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17
1   1          K. Vermeer    28  K  856 10 10   0   1  24   0   0   0   0   0
2  22        J. Cillessen    25  K 2204 25 24   1   0   8   0   0   0   0   0
3  30     M. van der Hart    20  K    0  0  0   0   0   2   0   0   0   0   0
4   2        R. van Rhijn    23  V 2786 32 31   1   1   1   2   3   6   0   1
5   3     T. Alderweireld    25  V  360  4  4   0   0   0   0   0   0   0   0
6   4        N. Moisander    28  V 1985 23 22   1   0   3   1   2   0   0   0
7   6    M. van der Hoorn    21  V  166  3  2   1   1  21   0   0   0   0   0
8  12          J. Veltman    22  V 2158 25 24   1   1   2   2   2   2   0   0
9  15         N. Boilesen    22  V 1445 20 17   3   6   6   1   2   3   0   0
10 17            D. Blind    24  V 2531 29 29   0   5   3   1   1   4   0   0
11 24          S. Denswil    21  V 1350 17 15   2   1  14   1   0   1   0   0
12 27           R. Ligeon    22  V  350  5  4   1   3   8   0   1   0   0   0
13 42        J. Riedewald    17  V  222  5  3   2   3  10   2   0   1   0   0
14 44             K. Tete    18  V    0  0  0   0   0   1   0   0   0   0   0
15  5          C. Poulsen    34  M 1523 29 14  15   3  20   1   3   2   0   0
16  8           L. Duarte    23  M  655 14  6   8   2  14   3   0   1   0   0
17  8          C. Eriksen    22  M  360  4  4   0   0   0   2   3   1   0   0
18 10          S. de Jong    25  M 1257 19 16   3   8   3   7   1   1   0   0
19 18         D. Klaassen    21  M 2102 26 23   3   2   5  10   3   1   0   0
20 20           L. Schöne    28  M 2149 29 25   4   6   6   9   8   1   0   0
21 25           T. Serero    24  M 2276 29 25   4   6   6   3   3   3   0   0
22 34            L. de Sa    21  M  512 12  5   7   5  12   1   1   1   0   0
23  7          V. Fischer    20  A 1636 24 19   5   6   6   3   2   1   0   0
24  9       K. Sigþórsson    24  A 1928 30 20  10  16  11  10   2   0   0   0
25 11               Bojan    23  A 1357 24 17   7  12  11   4   3   2   0   0
26 16         L. Andersen    19  A  405  9  4   5   3  14   0   0   0   0   0
27 19             T. Sana    24  A  223  4  2   2   1   7   0   0   0   0   0
28 23           D. Hoesen    23  A  450 14  4  10   2  15   2   1   0   0   0
29 43           R. Kishna    19  A  389  8  5   3   5   5   1   2   0   0   0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM