简体   繁体   中英

sylenium returns a promise pending

I am trying to build a web scraper. The goal is to download the pdf that can be accessed by a series of links on a webpage. Currently, I am trying to retrieve the URLs directing to the pdf files, so I would be able to insert them in eg node download helper (or maybe wget). Ideally, I would have an array of the different links that I can then iterate through.

Currently, the function looks like this.

function scrape(){
driver.get('https://examplelink.com/pagewheretofindthedifferentlinks')
.then(function(){ 
    return links = driver.findElements(By.partialLinkText('ABCD.')); //all the links contain the same pattern lets say 'ABCD.BLABLA.BLABLA'
})
.then(function(links){
    console.log(links[0].getAttribute('href'))
})}

For one or another reason this returns:

Promise { <pending> }

I have tried a lot of different forms of the async await... but nothing seems to work.

I have also tried to click the link and then use driver.getCurrentUrl() but this just returns the URL of the original page ('https://xxx.be/xxx') and not the URL of the tabs that are opened, which would lead me to implement a function that the driver switches between the different tabs...

Thank you in advance!

Ok this is the way I got it currently working:

//function for finding hrefs
function findHref(array, input){
          
   var href = driver.wait(array[input].getAttribute('href'))
   return href;
}

//delete all cookies
//driver.manage().deleteAllCookies();

//navigate findlinks
function scrape(){
   console.log('Starting scrape process')
   driver.get('https://blabla.com/blabla')
   .then(function(){
       return links = driver.findElements(By.tagName('a'));
   })
   .then(function(links){
       for(i=0; i<links.length; i++){
           findHref(links, i)
           .then(function(href){
               console.log("This is link:" + href)
           })
       }

   })
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM