R：使用XML從html表中獲取數據幀時遇到麻煩

Question

我試圖將表格中的數據從http://www.boxofficemojo.com/weekend/chart/?view=&yr=2015&wknd=09&p=.htm下載到數據框中。 這是我正在使用的代碼：

library(XML)
data <- readHTMLTable('http://www.boxofficemojo.com/weekend/chart/?view=&yr=2015&wknd=09&p=.htm')

不太熟悉XML庫，但我不確定如何從中獲取數據。 它包含在'數據'中，但它真的很難看，我無法弄清楚如何使用它。 有什么建議？

Answer 1

從瀏覽器頁面中，您可以在html表中看到46行和12列。 如果包含類似的內容，請使用str檢查結果（您的data ）：

> str(data, max.level = 1)
List of 5
 $ NULL:'data.frame':   0 obs. of  0 variables
 $ NULL: NULL
 $ NULL: NULL
 $ NULL:'data.frame':   49 obs. of  12 variables:
 $ NULL:'data.frame':   46 obs. of  12 variables:

最后一張桌子（第5號）看起來像你的目標。 你的表是：

my_table <- data[[5]]

您可以直接使用which參數指定表編號：

my_table <- readHTMLTable('the url', which = 5)

一些行和列：

> head(my_table[,3:6])
                                        V3    V4          V5     V6
1                             Focus (2015)    WB $19,100,000      -
2             Kingsman: The Secret Service   Fox $11,750,000 -36.0%
3 The SpongeBob Movie: Sponge Out of Water  Par. $11,200,000 -32.4%
4                     Fifty Shades of Grey  Uni. $10,927,000 -50.9%
5                       The Lazarus Effect Rela. $10,600,000      -
6                           McFarland, USA    BV  $7,797,000 -29.3%

Answer 2

與rvest超級簡單：

library(rvest)
library(magrittr)

pg <- html("http://www.boxofficemojo.com/weekend/chart/?view=&yr=2015&wknd=09&p=.htm")
pg %>% html_nodes("table") %>% extract2(5) %>% html_table()

magrittr包是可選的（我更喜歡在管道中使用它）。 如果您更喜歡非管道操作，您可以放棄magrittr / extract2並且只需：

library(rvest)
pg <- html("http://www.boxofficemojo.com/weekend/chart/?view=&yr=2015&wknd=09&p=.htm")
html_table(html_nodes(pg, "table")[[5]])

R：使用XML從html表中獲取數據幀時遇到麻煩

問題描述

2 個解決方案

解決方案1
0 已采納 2015-03-02 10:01:03

解決方案2
0 2015-03-02 11:36:14

R：使用XML從html表中獲取數據幀時遇到麻煩

問題描述

2 個解決方案

解決方案1 0 已采納 2015-03-02 10:01:03

解決方案2 0 2015-03-02 11:36:14

解決方案1
0 已采納 2015-03-02 10:01:03

解決方案2
0 2015-03-02 11:36:14