简体   繁体   English

使用getURL在R中进行Web爬取

[英]Web Scraping in R using getURL

Hi I am trying to read data of the World's powerfl brands from the link " http://www.forbes.com/powerful-brands/list/3/#tab:rank " into a data fame using R 嗨,我正在尝试使用R将链接为“ http://www.forbes.com/powerful-brands/list/3/#tab:rank ”的世界powerfl品牌的数据读取为数据。

I am a beginner so I tried using the following code to retrieve the data 我是一个初学者,所以我尝试使用以下代码来检索数据

  library(XML)
  library(RCurl)
  # Read and parse HTML file
  forbe = 'http://www.forbes.com/powerful-brands/list/#tab:rank'

  data <- getURL('http://www.forbes.com/powerful-brands/list/#tab:rank')
  data
  htmldata <- readHTMLTable(data)
  htmldata 

Could anyone please help me in retrieving data from the webpage mentioned 任何人都可以帮助我从提到的网页中检索数据

They use XHR requests to populate the page via javascript. 他们使用XHR请求通过javascript填充页面。 Use browser Developer Tools to see the Network requests 使用浏览器开发人员工具查看网络请求

在此处输入图片说明

and grab the JSON directly: 并直接获取JSON:

brands <- jsonlite::fromJSON("http://www.forbes.com/ajax/list/data?year=2015&uri=powerful-brands&type=organization")
str(brands)

## 'data.frame':    100 obs. of  10 variables:
##  $ position          : int  12 44 83 87 13 22 1 39 16 72 ...
##  $ rank              : int  12 44 83 87 13 22 1 39 16 72 ...
##  $ name              : chr  "AT&T" "Accenture" "Adidas" "Allianz" ...
##  $ uri               : chr  "att" "accenture" "adidas" "allianz" ...
##  $ imageUri          : chr  "att" "accenture" "adidas" "allianz" ...
##  $ industry          : chr  "Telecom" "Business Services" "Apparel" "Financial Services" ...
##  $ revenue           : num  132400 32800 14900 131600 87500 ...
##  $ oneYearValueChange: int  17 14 -14 -6 32 13 17 1 -5 -1 ...
##  $ brandValue        : num  29100 12000 6800 6600 28100 ...
##  $ advertising       : num  3272 88 NA NA 3300 ...

Why don't you try something like this . 你为什么不尝试一些像这样 Basically, doing something like: 基本上,做类似这样的事情:

download.file(forbe, htmldata, auto, quiet = FALSE, cacheOK = TRUE)

And the read data should be in the htmldata array variable. 并且读取的数据应该在htmldata数组变量中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM