I'm tying to extract the bottom table ('Daily Observations') from https://www.wunderground.com/history/daily/us/dc/washington/KDCA/date/2011-1-1 . I got to the full xpath for the table component but it shows {xml_nodeset (0)}
as the output. What am I doing wrong here? I used the following code:
library(rvest)
single <- read_html('https://www.wunderground.com/history/daily/us/dc/washington/KDCA/date/2011-1-1')
single %>%
html_nodes(xpath = '/html/body/app-root/app-history/one-column-layout/wu-header/sidenav/mat-sidenav-container/mat-sidenav-content/div/section/div[2]/div/div[5]/div/div/lib-city-history-observation/div/div[2]/table')
It seems that the table component is empty.
This is a dynamic page, with the table generated by Javascript. rvest
alone will not suffice. Nonetheless, you could get the source content from the JSON API.
library(tidyverse)
library(rvest)
library(lubridate)
library(jsonlite)
# Read static html. It won't create the table, but it holds the API key
# we need to retrieve the source JSON.
htm_obj <-
read_html('https://www.wunderground.com/history/daily/us/dc/washington/KDCA/date/2011-1-1')
# Retrieve the API key. This key is stored in a node with javascript content.
str_apikey <-
html_node(htm_obj, xpath = '//script[@id="app-root-state"]') %>%
html_text() %>% gsub("^.*SUN_API_KEY&q;:&q;|&q;.*$", "", . )
# Create a URI pointong to the API', with the API key as the first key-value pair of the query
url_apijson <- paste0(
"https://api.weather.com/v1/location/KDCA:9:US/observations/historical.json?apiKey=",
str_apikey,
"&units=e&startDate=20110101&endDate=20110101")
# Capture the JSON
json_obj <- fromJSON(txt = url_apijson)
# Wrangle the JSON's contents into the table you need
tbl_daily <-
json_obj$observations %>% as_tibble() %>%
mutate(valid_time_gmt = as_datetime(valid_time_gmt) %>%
with_tz("America/New_York")) %>% # The timezone this airport (KDCA) is located at.
select(valid_time_gmt, temp, dewPt, rh, wdir_cardinal, gust, pressure, precip_hrly) # The equvalent variables of your html table
# A tibble: 34 x 8
valid_time_gmt temp dewPt rh wdir_cardinal gust pressure precip_hrly
<dttm> <int> <int> <int> <chr> <lgl> <dbl> <dbl>
1 2010-12-31 23:52:00 38 NA 79 CALM NA 30.1 NA
2 2011-01-01 00:52:00 35 31 85 CALM NA 30.1 NA
3 2011-01-01 01:52:00 36 31 82 CALM NA 30.1 NA
4 2011-01-01 02:52:00 37 31 79 CALM NA 30.1 NA
5 2011-01-01 03:52:00 36 30 79 CALM NA 30.1 NA
6 2011-01-01 04:52:00 37 30 76 NNE NA 30.1 NA
7 2011-01-01 05:52:00 36 30 79 CALM NA 30.1 NA
8 2011-01-01 06:52:00 34 30 85 CALM NA 30.1 NA
9 2011-01-01 07:52:00 37 31 79 CALM NA 30.1 NA
10 2011-01-01 08:52:00 44 38 79 CALM NA 30.1 NA
# ... with 24 more rows
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.