简体   繁体   中英

http authentication using Rselenium/PhantomJS

Pretty new to Rselenium, working with Chrome for debugging purposes and then will move to PhantomJS for production (just because i can run the script in a loop without browser windows popping up).

I'm trying to scrape an https website that has a pretty vanilla authentication pop-up. when I'm using Chrome I can use the format https://user:pass@www.somewebsite.com . However, it seems like when I use phantomjs, this will not work. Is there a good way to pipe in credentials using RSelenium to drive PhantomJS?

If not, is there a better approach? ironically, I can log into the site using rvest/httr... the problem is that it's so java-heavy that I really need RSelenium for navigating and ultimately getting at the data I need.

Some sample code, though unfortunately I can't provide the password-protected site I am referencing:

library(RSelenium)
library(httr)
library(wdman)
selCommand<-wdman::selenium(jvmargs = c("-Dwebdriver.chrome.verboseLogging=true"),
                        retcommand = TRUE)
cat(selCommand)
#start Selenium server via shell script

remDr <- remoteDriver(port = 4567L, browserName = "chrome")
#remDr <- remoteDriver(port = 4567L, browserName = "phantomjs")
remDr$open()
remDr$navigate("https://user:pass@www.somewebiste.com") #works with chrome, 
                                                        #does not work with PhantomJS

Any help appreciated, and thanks.

You could use cookies from logging in on the using getAllCookies . Then, in the PhantomJS browser, call addCookie .

Should the call firstly be http rather than https .

library(RSelenium)

rD <- rsDriver(browser = "phantom")
remDr <- rD$client

remDr$navigate("http://user:passwd@httpbin.org/basic-auth/user/passwd")
> remDr$getPageSource()[[1]]
[1] "<html><head></head><body><pre style=\"word-wrap: break-word; white-space: pre-wrap;\">{\n  \"authenticated\": true, \n  \"user\": \"user\"\n}\n</pre></body></html>"
rm(rD)
gc()

Alternatively if this does not work you can set a custom header:

base64pw <- paste("Basic", 
                  base64enc::base64encode(charToRaw("user:passwd")))
eCaps <- list( "phantomjs.page.customHeaders.Authorization" = base64pw)
rD <- rsDriver(browser = "phantom", extraCapabilities = eCaps)
remDr <- rD$client

remDr$navigate("http://httpbin.org/basic-auth/user/passwd")
> remDr$getPageSource()[[1]]
[1] "<html><head></head><body><pre style=\"word-wrap: break-word; white-space: pre-wrap;\">{\n  \"authenticated\": true, \n  \"user\": \"user\"\n}\n</pre></body></html>"
rm(rD)
gc()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM