Web scraping html with rvest and R

Question

I would like to web scrape this web site https://www.askramar.com/Ponuda . First, I should scrape all the links that lead to each car page. The extended links look like this in the html structure:

I tried the following code but I get an empty object in R:

url <- "https://www.askramar.com/Ponuda"
html_document <- read_html(url)


links <- html_document %>%
  html_nodes(xpath = '//*[contains(concat(" ", @class, " "), concat(" ", "vozilo", " "))]') %>%
  html_attr(name = "href")

Is it javascript on web page? Please help! Thanks!

Answer 1

Yes, the page uses javascript to load the contents you are interested in. However, it does this simply by calling an xhr GET request to https://www.askramar.com/Ajax/GetResults.cshtml . You can do the same:

url <- "https://www.askramar.com/Ajax/GetResults.cshtml?stranica="

links <- list()
for(i in 1:45)
{
  links[[i]] <- httr::GET(paste0(url, i - 1)) %>% read_html %>%
  html_nodes(xpath = '//a[contains(@href, "Vozilo")]') %>%
  html_attr(name = "href")
}

links <- do.call("c", links)

print(links)


# [1] "Vozilo?id=17117" "Vozilo?id=17414" "Vozilo?id=17877" "Vozilo?id=17834"
# [5] "Vozilo?id=17999" "Vozilo?id=18395" "Vozilo?id=17878" "Vozilo?id=16256"
# [9] "Vozilo?id=17465" "Vozilo?id=17560" "Vozilo?id=17912" "Vozilo?id=18150"
#[13] "Vozilo?id=18131" "Vozilo?id=17397" "Vozilo?id=18222" "Vozilo?id=17908"
#[17] "Vozilo?id=18333" "Vozilo?id=17270" "Vozilo?id=18105" "Vozilo?id=16803"
#[21] "Vozilo?id=16804" "Vozilo?id=17278" "Vozilo?id=17887" "Vozilo?id=17939"
# ...plus 1037 further elements

Answer 2

If you inspect the network on the page, you see it sends GET requests with many query parameters, the most important 'stranice'. Using the above information I did the following:

library(rvest)

stranice <- 1:3

askramar_scrap <- function(stranica) {
  url <- paste0("https://www.askramar.com/Ajax/GetResults.cshtml?stanje=&filter=&lokacija=&", 
                "pojam=&marka=&model=&godinaOd=&godinaDo=&cijenaOd=&cijenaDo=&snagaOd=&snagaDo=&", 
                "karoserija=&mjenjac=&boja=&pogon4x4=&sifra=&stranica=", stranica, "&sort=")
  html_document <- read_html(url)
  links <- html_document %>%
    html_nodes(xpath = '//a[contains(@href, "Vozilo")]') %>%
    html_attr(name = "href")
}

links <- lapply(stranice, askramar_scrap)
links <- unlist(links)
links <- unique(links)

Hope that is what you need.

Web scraping html with rvest and R

Question

2 answers

solution1
2 ACCPTED 2019-12-30 16:10:36

solution2
1 2019-12-30 17:18:10

Web scraping html with rvest and R

Question

2 answers

solution1 2 ACCPTED 2019-12-30 16:10:36

solution2 1 2019-12-30 17:18:10

solution1
2 ACCPTED 2019-12-30 16:10:36

solution2
1 2019-12-30 17:18:10