简体   繁体   中英

Trying to download a cached picture in RSelenium

I am using RSelenium to download a series of newspaper articles from an online repository. So far, the way I am doing so is using the remDr$screenshot() function but, due to resolution, zooming and framing reasons, I wonder if it is possible to just download the picture as it is presented. The sample code to access a page is the following:

library(RSelenium)
rD1 <- rsDriver(browser = "firefox",port=4567L)
remDr <- rD1[["client"]] 

url1<-"http://memoria.bn.br/DocReader/DocReader.aspx?"
url2<-"bib=090972_07&pesq=cangaceiro&pasta=ano%20192"
remDr$navigate(paste0(url1,url2))

By looking at the source code of the page, I note that the image is hosted in a cache url cache/2286106490137/I0000051-20Alt=000869Lar=000615LargOri=005060AltOri=007149.JPG (with id DocumentoImg ). Is there a way to simply download it from this address, without relying on screenshots?

Yes, you can download the image directly in R like this:

# I have split the url just to make it legible on screen here
url_pt1  <- "http://memoria.bn.br/DocReader/cache/2627304510157"
url_pt2  <- "/I0000051-20Alt=001984Lar=001404LargOri=005060AltOri=007149.JPG"
big_url  <- paste0(url_pt1, url_pt2)

# Choose local file location to download file
file_to  <- "download.jpg" 

download.file(big_url, file_to)
#> trying URL 'http://memoria.bn.br/DocReader/cache/2627304510157
#> /I0000051-20Alt=001984Lar=001404LargOri=005060AltOri=007149.JPG'
#> Content type 'text/html; charset=utf-8' length 8457 bytes
#> downloaded 8457 bytes

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM