Web Scraping Iteratively from a WebPage in R

Question

I have a webpage that contains a table having 243 pages. Each page has 34 rows. The structure of the url looks like this for page 1. http://this-site.com/service/?currpage=1 .

I'd like to get all the data for the 243 pages and save them in one csv file.

So far, the code I am using per page is

library(XML)
url <- http://this-site.com/service/?currpage=1
service <- as.data.frame(readHTMLTable(url))
head(service)
service <- read_html(url)

How do I loop in the number from 1 to 243 so that I get all the pages and download write them to a csv?

Answer 1

library(tidyverse)
library(rvest)

pages <- 1:243
base_url <- "http://this-site.com/service/?currpage="
urls <- paste0(base_url, pages)

get_table <- function(url) {
  url %>%
    read_html() %>%
    html_table() # might not need this???
}

results <- sapply(urls, get_table)

bind_rows(reuslts) %>%
  as_data_frame() %>%
  write_csv(path = "some/path/somwhere")

Web Scraping Iteratively from a WebPage in R

Question

1 answers

solution1
1 ACCPTED 2019-03-21 19:29:01

Web Scraping Iteratively from a WebPage in R

Question

1 answers

solution1 1 ACCPTED 2019-03-21 19:29:01

solution1
1 ACCPTED 2019-03-21 19:29:01