簡體   English   中英

使用rvest和html_nodes()和html_table()提取網站表

[英]Extract Website Tables using rvest and html_nodes() and html_table()

我正在嘗試從“籃球參考”網站中提取數據。

library(rvest)
data7 <- read_html("http://www.basketball-reference.com/teams/CLE/2017.html") %>%
html_nodes("[id=roster]") %>%
html_table()
data7

上面的代碼返回“花名冊”表中的數據。 但是,以下代碼不返回“ team_misc”表,而是返回距離為零的列表:

html_nodes("[id=team_misc]") %>%

我對rvest相當陌生,因此,如果有人對為什么這樣做不起作用有任何想法,將不勝感激。

實際上已經有一個答案,但是它適用於網站的較早版本。...之所以無法獲得其他表,是因為它們是動態創建的,並且在R呈現原始頁面時,您想要的表位於注釋掉字符串。 您應該在chrome上檢查頁面的元素,以查看我指的是什么。 另一個答案是在這里如何用R在html的comment標記中刮擦表格?

但對於您的年度數據:

A <- read_html('http://www.basketball-reference.com/teams/CLE/2017.html') %>% # Read in the raw webpage
  xml_find_all('//comment()') %>% # Use xpath to find all comment nodes
  xml_text() %>% # convert to raw strings 
  paste0(collapse = "") %>% # flatten into a character vector
  read_html %>% # re-read as html content 
        xml_find_all("//table") %>% html_table

cat(capture.output(lapply(A, head, 1)), sep = "\n")


[[1]]
                   Date Type                                                                                       Note
1 Kevin Love 2017-02-12 Knee Love is expected to miss six weeks after undergoing arthroscopic surgery on his left knee.

[[2]]
            X1                X2
1 Jim Boylan   Assistant Coach

[[3]]
        G    MP   FG  FGA   FG%  3P  3PA  3P%   2P  2PA   2P%   FT  FTA   FT% ORB  DRB  TRB  AST STL BLK TOV   PF  PTS
1 Team 58 14020 2305 4938 0.467 761 1952 0.39 1544 2986 0.517 1073 1420 0.756 564 1988 2552 1304 414 237 804 1033 6444

[[4]]
   NA NA NA NA  NA  NA  NA   NA   NA   NA Advanced   NA Offense Four Factors   NA   NA     NA Defense Four Factors   NA   NA     NA               NA
1   W  L PW PL MOV SOS SRS ORtg DRtg Pace      FTr 3PAr                 eFG% TOV% ORB% FT/FGA                 eFG% TOV% DRB% FT/FGA Arena Attendance

[[5]]
  Rk              Age  G GS   MP  FG  FGA   FG%  3P 3PA   3P%  2P  2PA   2P%  eFG%  FT FTA   FT% ORB DRB TRB AST STL BLK TOV  PF PTS/G
1  1 LeBron James  32 54 54 37.5 9.6 17.7 0.541 1.7 4.4 0.387 7.9 13.3 0.592 0.589 4.8 6.9 0.691 1.1 6.7 7.9 8.9 1.4 0.6 4.3 1.7  25.7

[[6]]
  Rk              Age  G GS   MP  FG FGA   FG% 3P 3PA   3P%  2P 2PA   2P%  eFG%  FT FTA   FT% ORB DRB TRB AST STL BLK TOV PF  PTS
1  1 LeBron James  32 54 54 2026 518 957 0.541 92 238 0.387 426 719 0.592 0.589 259 375 0.691  62 363 425 479  74  32 230 92 1387

[[7]]
  Rk              Age  G GS   MP  FG FGA   FG%  3P 3PA   3P%  2P  2PA   2P%  FT FTA   FT% ORB DRB TRB AST STL BLK TOV  PF  PTS
1  1 LeBron James  32 54 54 2026 9.2  17 0.541 1.6 4.2 0.387 7.6 12.8 0.592 4.6 6.7 0.691 1.1 6.5 7.6 8.5 1.3 0.6 4.1 1.6 24.6

[[8]]
  Rk              Age  G GS   MP   FG  FGA   FG%  3P 3PA   3P%   2P  2PA   2P%  FT FTA   FT% ORB DRB  TRB  AST STL BLK TOV  PF PTS    ORtg DRtg
1  1 LeBron James  32 54 54 2026 12.7 23.4 0.541 2.3 5.8 0.387 10.4 17.6 0.592 6.3 9.2 0.691 1.5 8.9 10.4 11.7 1.8 0.8 5.6 2.3  34 NA  118  107

[[9]]
  Rk              Age  G   MP  PER   TS%  3PAr   FTr ORB% DRB% TRB% AST% STL% BLK% TOV% USG% Â  OWS DWS  WS WS/48 Â  OBPM DBPM BPM VORP
1  1 LeBron James  32 54 2026 26.3 0.618 0.249 0.392  3.5 19.1 11.6 41.7  1.8  1.3   17 29.4 NA 6.9 2.4 9.3  0.22 NA  6.3  1.8   8  5.1

[[10]]
     NA   NA   NA   NA   NA   NA                   NA   NA   NA   NA NA   NA              NA   NA   NA   NA NA   NA 2-Pt Field Goals    NA   NA 3-Pt Field Goals     NA
1  <NA> <NA> <NA> <NA> <NA> <NA> % of FGA by Distance <NA> <NA> <NA> NA <NA> FG% by Distance <NA> <NA> <NA> NA <NA>                  Dunks <NA>                  Corner
    NA     NA   NA
1 <NA> Heaves <NA>

[[11]]
  Rk                   Salary
1  1 LeBron James $30,963,450

[[12]]
                           Yr  Tm Rd Pk             Team     G  MP FG FGA   FG% 3P 3PA 3P% FT FTA   FT% ORB DRB TRB AST STL BLK TOV PF PTS
1 Vladimir Veremeenko NA 2006 WAS  2 48 NA Reggio Emilia it 18 139 17  29 0.586  0   0  NA  4   9 0.444  14  10  24   8   2   3   9 33  38

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM