简体   繁体   中英

How do I scrape data from this specific website using r?

I want to download the data from this website.

http://asphaltoilmarket.com/index.php/state-index-tracker/

But the request keeps getting timed out.

I have tried following methods already, but it keep getting timed out.

library(rvest)
IndexData <- read_html("http://asphaltoilmarket.com/index.php/state-index-tracker/")
library(RCurl)
IndexData <- getURL("http://asphaltoilmarket.com/index.php/state-index-tracker/")
library(httr)
library(XML)
IndexData <- htmlParse(GET(url))

This website opens in the browser without any problem, and I am able to download this data using excel and alteryx.

If by "get the data", you mean "scrape the table on that page", then you just need to go a little further.

First thing, you'll want to check the sites robots.txt to see if scraping is allowed. In this case, there is no mention against scraping.

You've got the html for the site, you just need to find the css selector for what you want. You can use developer tools or something like selector gadget to find the table and get its css selector.

After that you use the html, extract the node you're interested in with html_node() then extract the table with html_table() .

library(magrittr)
library(rvest)

html <-read_html("http://asphaltoilmarket.com/index.php/state-index-tracker/")

html %>% 
  html_node("#tablepress-5") %>% 
  html_table()
#>             State     Jan     Feb     Mar     Apr     May     Jun     Jul
#> 1         Alabama $496.27 $486.86 $482.16 $498.62 $517.44 $529.20 $536.26
#> 2          Alaska $513.33 $513.33 $513.33 $513.33 $513.33 $525.84 $535.00
#> 3         Arizona $476.00 $469.00 $466.00 $463.00 $470.00 $478.00 $480.00
#> 4        Arkansas $503.50 $500.50 $494.00 $503.00 $516.50 $521.20 $525.00
#> 5      California $305.80 $321.00 $346.20 $365.50 $390.10 $380.50 $345.50
#> 6        Colorado $228.10 $301.45 $320.58 $354.12 $348.70 $277.55 $297.23
#> 7     Connecticut $495.00 $495.00 $495.00 $495.00 $502.50 $502.50 $500.56
#> 8        Delaware $493.33 $458.33 $481.67 $496.67 $513.33 $510.00 $498.33
#> 9         Florida $507.30 $484.32 $487.12 $503.38 $518.52 $517.68 $514.03
#> 10        Georgia $515.00 $503.00 $503.00 $517.00 $534.00 $545.00 $550.00 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM