I am trying to learn web scraping using R and rvest to pull some info but I can't get it to pull and I'm clearly missing something.
I am trying to pull the table from https://www.wunderground.com/history/monthly/KMCI/date/2014-8 that shows Daily Obsercations. But I can't seem to get it to ready table, tr, td or standard tags that I'm familiar with.
I tried to use rSelenium but when I try the first command I just get "PATH to JAVA not found. Please check JAVA is installed." So trying to only use rvest.
What am I missing here?
Here is the code I have so far if it helps:
library(rvest)
wind_site <- "https://www.wunderground.com/history/monthly/KMCI/date/2014-8"
HTML <- read_html(wind_site)
wind_table_html <- HTML %>% html_nodes("table") %>% html_table()
I have been able to extract the content of the tables with the following code (you need to install docker, see https://docs.docker.com/engine/install/ ):
library(RSelenium)
library(rvest)
url <- "https://www.wunderground.com/history/monthly/KMCI/date/2014-8"
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate(url)
Sys.sleep(5)
htmltxt <- remDr$getPageSource()[[1]]
Sys.sleep(5)
read_html(htmltxt) %>% html_table()
[1]]
# A tibble: 14 x 6
X1 X2 X3 X4 X5 X6
<chr> <chr> <chr> <chr> <chr> <chr>
1 Temperature (°F) Max Average Min Polygon NA
2 Max Temperature 97 86.32 74 NA NA
3 Avg Temperature 85.46 76.47 63.5 NA NA
4 Min Temperature 78 68.71 58 NA NA
5 Dew Point (°F) Max Average Min Polygon NA
6 Dew Point 76 66.82 55 NA NA
7 Precipitation (in) Max Average Min Sum Polygon
8 Precipitation 3.47 0.20 0.00 6.28 NA
9 Snowdepth 0.00 0.00 0.00 0.00 NA
10 Wind (mph) Max Average Min Polygon NA
11 Wind 30 8.5 0 NA NA
12 Gust Wind 49 1.32 0 NA NA
13 Sea Level Pressure (in) Max Average Min Polygon NA
14 Sea Level Pressure 29.11 28.88 28.61 NA NA
[[2]]
# A tibble: 226 x 551
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 Time Temp~ Dew ~ Humi~ Wind~ Pres~ Prec~ NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2 Aug ~ Aug 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
3 Aug NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
4 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
5 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
6 3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
7 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
8 5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
9 6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
10 7 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
# ... with 216 more rows, and 529 more variables: X23 <int>, X24 <int>, X25 <int>, X26 <int>, X27 <int>, X28 <int>, X29 <int>,
# X30 <int>, X31 <int>, X32 <int>, X33 <int>, X34 <chr>, X35 <chr>, X36 <chr>, X37 <chr>, X38 <int>, X39 <dbl>, X40 <int>,
# X41 <int>, X42 <dbl>, X43 <int>, X44 <int>, X45 <dbl>, X46 <int>, X47 <int>, X48 <dbl>, X49 <int>, X50 <int>, X51 <dbl>,
# X52 <int>, X53 <int>, X54 <dbl>, X55 <int>, X56 <int>, X57 <dbl>, X58 <int>, X59 <int>, X60 <dbl>, X61 <int>, X62 <int>,
# X63 <dbl>, X64 <int>, X65 <int>, X66 <dbl>, X67 <int>, X68 <int>, X69 <dbl>, X70 <int>, X71 <int>, X72 <dbl>, X73 <int>,
# X74 <int>, X75 <dbl>, X76 <int>, X77 <int>, X78 <dbl>, X79 <int>, X80 <int>, X81 <dbl>, X82 <int>, X83 <int>, X84 <dbl>,
# X85 <int>, X86 <int>, X87 <dbl>, X88 <int>, X89 <int>, X90 <dbl>, X91 <int>, X92 <int>, X93 <dbl>, X94 <int>, X95 <int>, ...
# i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
[[3]]
# A tibble: 32 x 1
X1
<chr>
1 Aug
2 1
3 2
4 3
5 4
6 5
7 6
8 7
9 8
10 9
# ... with 22 more rows
# i Use `print(n = ...)` to see more rows
[[4]]
# A tibble: 32 x 3
X1 X2 X3
<chr> <chr> <chr>
1 Max Avg Min
2 84 74.4 65
3 88 77.0 63
4 86 72.1 66
5 91 78.9 68
6 91 77.8 70
7 89 74.2 68
8 83 73.3 69
9 78 71.6 67
10 83 74.2 66
# ... with 22 more rows
# i Use `print(n = ...)` to see more rows
[[5]]
# A tibble: 32 x 3
X1 X2 X3
<chr> <chr> <chr>
1 Max Avg Min
2 63 61.3 58
3 64 61.2 59
4 68 63.9 60
5 71 66.6 62
6 73 70.1 68
7 72 68.5 65
8 71 68.2 67
9 68 66.8 65
10 70 67.2 65
# ... with 22 more rows
# i Use `print(n = ...)` to see more rows
[[6]]
# A tibble: 32 x 3
X1 X2 X3
<chr> <chr> <chr>
1 Max Avg Min
2 84 65.4 46
3 87 60.6 37
4 96 76.5 49
5 84 67.0 45
6 100 79.2 52
7 97 83.8 55
8 97 84.8 63
9 97 85.0 68
10 96 79.7 60
# ... with 22 more rows
# i Use `print(n = ...)` to see more rows
[[7]]
# A tibble: 32 x 3
X1 X2 X3
<chr> <chr> <chr>
1 Max Avg Min
2 10 6.0 0
3 10 4.9 0
4 20 10.0 6
5 17 10.1 0
6 13 5.7 0
7 21 10.9 3
8 13 8.0 3
9 10 6.5 0
10 12 4.7 0
# ... with 22 more rows
# i Use `print(n = ...)` to see more rows
[[8]]
# A tibble: 32 x 3
X1 X2 X3
<chr> <chr> <chr>
1 Max Avg Min
2 29.0 29.0 28.9
3 29.1 29.0 29.0
4 29.1 29.0 28.9
5 29.0 28.9 28.9
6 29.0 28.9 28.9
7 28.9 28.9 28.8
8 28.8 28.8 28.8
9 28.9 28.9 28.8
10 29.0 28.9 28.9
# ... with 22 more rows
# i Use `print(n = ...)` to see more rows
[[9]]
# A tibble: 32 x 1
X1
<chr>
1 Total
2 0.00
3 0.00
4 0.04
5 0.71
6 0.00
7 0.00
8 3.47
9 0.00
10 0.00
# ... with 22 more rows
# i Use `print(n = ...)` to see more rows
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.