[英]R error “Selenium message:Unable to create new service: ChromeDriverService”
[英]R Selenium unable to findElement return Error Selenium message:Unable to locate element
我正在從這個“https://lsf.uni-heidelberg.de/qisserver/rds?state=change&type=6&moduleParameter=personalSelect&nextdir=change&next=SearchSelect.vm&target=personSearch&subdir=person&init=y&source=state%3Dchange%26type%3D5 %26moduleParameter%3DpersonSearch%26nextdir%3Dchange%26next%3Dsearch.vm%26subdir%3Dperson%26menuid%3Dsearch%26_form%3Ddisplay%26topitem%3Dmembers%26subitem%3D%26field%3DNachname&targetfield=Nachname&_form=display”。 我想搜索每個人以收集 email 地址。 我正在執行以下操作,但找不到提交搜索按鈕的方法。
#url
uni<-"https://lsf.uni-heidelberg.de/qisserver/rds?state=change&type=6&moduleParameter=personalSelect&nextdir=change&next=SearchSelect.vm&target=personSearch&subdir=person&init=y&source=state%3Dchange%26type%3D5%26moduleParameter%3DpersonSearch%26nextdir%3Dchange%26next%3Dsearch.vm%26subdir%3Dperson%26menuid%3Dsearch%26_form%3Ddisplay%26topitem%3Dmembers%26subitem%3D%26field%3DNachname&targetfield=Nachname&_form=display"
#people's name
r<-read_html(uni)
name <- r %>%
html_nodes("a") %>%
html_text()
name<-name[40:length(name)]
name<-gsub("\n","",name ,fixed = T)
name<-gsub("\t","",name ,fixed = T)
#people's first link
link <- r %>%
html_nodes("a") %>%
html_attrs() %>%
as.character()
link<-link[40:length(link)]
link<-str_split(link, '"')
link<-sapply(link, "[", 6)
#create a loop: with R selenium, click on search for each link and get emails which are in the next page
rD <- rsDriver(browser="firefox", port=4545L, verbose=F)
remDr <- rD[["client"]]
#remDr$navigate("https://ki.se/en/research/professors-at-ki")
for (i in 1:lenght(link)) {
i=1
#r<- read_html(link[i])
remDr$navigate(link[i])
webElem <- remDr$findElement(using = 'xpath', '//*+[contains(concat( " ", @class, " " ), concat( " ", "abstand_search", " " ))]//font//input')
webElem$clickElement()
#here i get the error
}
這里有一些指示。 我會 go 在閱讀時使用更快、更直觀的 css 選擇器來收集鏈接:
library(rvest)
links <- read_html('https://lsf.uni-heidelberg.de/qisserver/rds?state=change&type=6&moduleParameter=personalSelect&nextdir=change&next=SearchSelect.vm&target=personSearch&subdir=person&init=y&source=state%3Dchange%26type%3D5%26moduleParameter%3DpersonSearch%26nextdir%3Dchange%26next%3Dsearch.vm%26subdir%3Dperson%26menuid%3Dsearch%26_form%3Ddisplay%26topitem%3Dmembers%26subitem%3D%26field%3DNachname&targetfield=Nachname&_form=display') %>%
html_nodes('.regular[name]') %>%
html_attr('href')
然后,我會使用相同的策略來定位搜索按鈕:
webElem <- remDr$findElement(using = 'css selector', '.abstand_search + [value="Suche starten"]') # this matches for the element which is interactable
最后,我會從目標頁面獲取名稱和 email
name <- remDr$findElement(using = 'css selector', '.regular')
email <- remDr$findElement(using = 'css selector', '[href*=mail]') # could also take 2nd match for .regular
我通過在循環中以下列方式使用 rvest 來解決它
#use Rselenium to dowload emails
rD <- rsDriver(browser="firefox", port=4545L, verbose=F)
remDr <- rD[["client"]]
emails<-list()
for (i in 1:length(links)) {
#r<- read_html(link[i])
remDr$navigate(links[i])
webElem <- remDr$findElement(using = 'css selector', '.abstand_search + [value="Suche starten"]') # this matches for the element which is interactable
webElem$clickElement()
r <- read_html(unlist(webElem$getCurrentUrl()))
mail <- r %>%
html_nodes("a") %>%
html_attrs() %>%
as.character() %>%
str_subset("mailto:") %>%
str_remove("mailto:")
if(length(mail)!=0){
a<- str_split(mail, "href")
a<-unlist(a)
w<-which((grepl("@",a, fixed = T)))
emails<-c(emails,a[w])
}else{ emails<-c(emails,NA)}
rm(mail)
}
不僅僅是優雅的代碼,但它可以工作。 因為名稱更復雜,我找不到正確的 css 或 xpath 的方法。 讓我知道您是否可以想到更優雅、更快速的代碼,或者該問題是否只能使用 brute forze 方式解決。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.