简体   繁体   中英

R - web scraping with ''load more'' button

I am trying to get the data about types of beers and locations where they are most popular from this webpage: https://untappd.com/La_Source

I wrote the code:

library(rvest)
library(dplyr)

link = "https://untappd.com/La_Source"
page = read_html(link)

name = page %>% html_nodes(".user") %>% html_text()
place = page %>% html_nodes("a:nth-child(4)") %>% html_text()
user = page %>% html_nodes(".user") %>% html_text()

user_links = page %>% html_nodes(".user") %>%
  html_attr("href") %>% paste("https://untappd.com/", ., sep="")
  
get_city = function(user_link) {
#  user_link= 'https://untappd.com/user/Linty'
  user_page = read_html(user_link)
  user_city = user_page %>% html_nodes(".location") %>%
    html_text() %>% paste(collapse = ",")
  return(user_city)
}

city = sapply(user_links, FUN = get_city, USE.NAMES = FALSE)  

#brewery = page %>% html_nodes("a:nth-child(3)") %>% html_text()

Beer = data.frame(name, place,user,city, stringsAsFactors = FALSE)
write.csv(Beer, "Beer.csv")

which works really nicely and gives me needed data. The issue when I try to get more data by ''pressing load more button '' at the bottom of the page. I am not sure how I can do it in R. Any advices?

You can press the "button show more" with the following code:

library(RSelenium)
url <- "https://untappd.com/La_Source"
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate(url)

remDr$executeScript("scroll(0,100000);")

web_Obj_Show_More <- remDr$findElement("css selector", '#slide > div.cont.brewery-page > div > div.box.activity > div > a')
web_Obj_Show_More$clickElement()

You can also consider the following approach:

library(RDCOMClient)
url <- "https://untappd.com/La_Source"
IEApp <- COMCreate("InternetExplorer.Application")
IEApp[['Visible']] <- TRUE
IEApp$Navigate(url)
Sys.sleep(15)
doc <- IEApp$Document()

clickEvent <- doc$createEvent("MouseEvent")
clickEvent$initEvent("click", TRUE, FALSE)

web_Obj <- doc$querySelector('#slide > div.cont.brewery-page > div > div.box.activity > div > a')
web_Obj$dispatchEvent(clickEvent)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM