I have a situation where i want to scrape multiple tables across different urls. I did manage to scrape one page, but my function is failing when i try to scrape across pages and stack the tables as a dataframe/list.
library(rvest)
library(tidyverse)
library(purrr)
index <-225:227
urls <- paste0("https://lsgkerala.gov.in/en/lbelection/electdmemberdet/2010/", index)
get_gram <- function(url){
urls %>%
read_html() %>%
html_nodes(xpath = '//*[@id="block-zircon-content"]/a[2]') %>%
html_text() -> temp
urls %>%
read_html() %>%
html_nodes(xpath = '//*[@id="block-zircon-content"]/table') %>%
html_table() %>%
as.data.frame() %>% add_column(newcol=str_c(temp))
}
#results <- map_df(urls,get_gram) Have commented this out, but this is what i
# used to get the table when the index just had one element and it worked.
results <- list()
results[[i]] <- map_df(urls,get_gram)
I think I am faltering at the step where i must stack the map_df output and I thank you in advance for your help!
You are passing url
to the function and using urls
in the body of the function. Try this version :
library(rvest)
library(dplyr)
index <-225:227
urls <- paste0("https://lsgkerala.gov.in/en/lbelection/electdmemberdet/2010/", index)
get_gram <- function(url){
webpage <- url %>% read_html()
webpage %>%
html_nodes(xpath = '//*[@id="block-zircon-content"]/a[2]') %>%
html_text() -> temp
webpage %>%
html_nodes(xpath = '//*[@id="block-zircon-content"]/table') %>%
html_table() %>%
as.data.frame() %>% add_column(newcol=temp)
}
result <- purrr::map_df(urls,get_gram)
Consider this approach. We only need to use html_node
because your code suggests that there is only one table per page to scrape.
library(tidyverse)
library(rvest)
get_title <- . %>% html_node(xpath = '//*[@id="block-zircon-content"]/a[2]') %>% html_text()
get_table <- . %>% html_node(xpath = '//*[@id="block-zircon-content"]/table') %>% html_table()
urls <- paste0("https://lsgkerala.gov.in/en/lbelection/electdmemberdet/2010/", 225:227)
tibble(urls) %>%
mutate(
page = map(urls, read_html),
newcol = map_chr(page, get_title),
data = map(page, get_table),
page = NULL, urls = NULL
) %>%
unnest(data)
Output
# A tibble: 52 x 7
newcol `Ward No.` `Ward Name` `Elected Members` Role Party Reservation
<chr> <int> <chr> <chr> <chr> <chr> <chr>
1 Thiruvananthapuram - Chemmaruthy Grama Panchayat 1 VANDIPPURA BABY P Member CPI(M) Woman
2 Thiruvananthapuram - Chemmaruthy Grama Panchayat 2 PALAYAMKUNNU SREELATHA D Member INC Woman
3 Thiruvananthapuram - Chemmaruthy Grama Panchayat 3 KOVOOR KAVITHA V Member INC Woman
4 Thiruvananthapuram - Chemmaruthy Grama Panchayat 4 SIVAPURAM ANIL. V Member INC General
5 Thiruvananthapuram - Chemmaruthy Grama Panchayat 5 MUTHANA JAYALEKSHMI S Member INC Woman
6 Thiruvananthapuram - Chemmaruthy Grama Panchayat 6 MAVINMOODU S SASIKALA NATH Member CPI(M) Woman
7 Thiruvananthapuram - Chemmaruthy Grama Panchayat 7 NJEKKADU P.MANILAL Member INC General
8 Thiruvananthapuram - Chemmaruthy Grama Panchayat 8 CHEMMARUTHY SASEENDRA President INC Woman
9 Thiruvananthapuram - Chemmaruthy Grama Panchayat 9 PANCHAYAT OFFICE PRASANTH PANAYARA Member INC General
10 Thiruvananthapuram - Chemmaruthy Grama Panchayat 10 VALIYAVILA SANJAYAN S Member INC General
# ... with 42 more rows
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.