使用getURL在R中進行Web爬取

Question

嗨，我正在嘗試使用R將鏈接為“ http://www.forbes.com/powerful-brands/list/3/#tab:rank ”的世界powerfl品牌的數據讀取為數據。

我是一個初學者，所以我嘗試使用以下代碼來檢索數據

  library(XML)
  library(RCurl)
  # Read and parse HTML file
  forbe = 'http://www.forbes.com/powerful-brands/list/#tab:rank'

  data <- getURL('http://www.forbes.com/powerful-brands/list/#tab:rank')
  data
  htmldata <- readHTMLTable(data)
  htmldata

任何人都可以幫助我從提到的網頁中檢索數據

Answer 1

他們使用XHR請求通過javascript填充頁面。 使用瀏覽器開發人員工具查看網絡請求

並直接獲取JSON：

brands <- jsonlite::fromJSON("http://www.forbes.com/ajax/list/data?year=2015&uri=powerful-brands&type=organization")
str(brands)

## 'data.frame':    100 obs. of  10 variables:
##  $ position          : int  12 44 83 87 13 22 1 39 16 72 ...
##  $ rank              : int  12 44 83 87 13 22 1 39 16 72 ...
##  $ name              : chr  "AT&T" "Accenture" "Adidas" "Allianz" ...
##  $ uri               : chr  "att" "accenture" "adidas" "allianz" ...
##  $ imageUri          : chr  "att" "accenture" "adidas" "allianz" ...
##  $ industry          : chr  "Telecom" "Business Services" "Apparel" "Financial Services" ...
##  $ revenue           : num  132400 32800 14900 131600 87500 ...
##  $ oneYearValueChange: int  17 14 -14 -6 32 13 17 1 -5 -1 ...
##  $ brandValue        : num  29100 12000 6800 6600 28100 ...
##  $ advertising       : num  3272 88 NA NA 3300 ...

Answer 2

你為什么不嘗試一些像這樣。 基本上，做類似這樣的事情：

download.file(forbe, htmldata, auto, quiet = FALSE, cacheOK = TRUE)

並且讀取的數據應該在htmldata數組變量中。

使用getURL在R中進行Web爬取

問題描述

2 個解決方案

解決方案1
1 2016-01-06 19:55:16

解決方案2
0 2016-01-06 19:13:05

使用getURL在R中進行Web爬取

問題描述

2 個解決方案

解決方案1 1 2016-01-06 19:55:16

解決方案2 0 2016-01-06 19:13:05

解決方案1
1 2016-01-06 19:55:16

解決方案2
0 2016-01-06 19:13:05