简体   繁体   中英

How to get all links on page with javascript navigation?

It's easy in classical HTML

$('a').map(function(){return this.href}).toArray()

But, if navigation is done via JavaScript with something like:

<a href='#' onclick='someFn()'>Some link</a>

It's impossible to know url without executing that someFn function, and if you execute it - it won't return url, instead it navigate page away (I don't have control over someFn function and don't know what's inside or how to change it).

So, in order to get all N links from page you has to load all of those N pages. This is very slow and inefficient.

How it can be solved?

Possible solution - if it would be possible to intercept call to window.location - problem solved. You can just click all those links and check value of window.location without loading new pages. But I don't know if this is possible to do (I use phantomjs and it seems it can't do it).

Note

There are no URL in HTML, even after JavaScript is executed. Yes, in some cases you can use browser emulator to execute JS and then parse HTML generated dynamically in browser. But it's not the case, I use browser emulator (phantomjs) but there are no URL & navigation in HTML even after JS executed, all navigation done in pure JS, with

<a onclick=tonsOfWeirdBlackBoxFunctionsYouCantChange>

If you are trying to parse an already executed source on a page, you will need to you regex functions to string search for those URL's.

If you are trying to parse code at runtime for locations and save them to an array or something --- every function every object in JavaScript has a toString function.

That is --- if you define your functions as objects:

//Although you should really be using a parameter for this...
//...I'm trying to hold context with your use case.
var redirectToContact = function(){
  window.location = "/contact.html";
}

You can redirectToContact.toString() and run regex functions on that:

Mabye something like:

var url = redirectToContact.toString();
console.log( url.match( 'window.location = "(.*)";' )[1] );

I think what you want to do is override the prototype for window.location . This SO post explains how that could be done: Is it possible to override window.location.hostname in Javascript?

However, you have to inject a javascript snippet into each page that runs before any other scripts. I have been working on similar functionality for the Crawljax web crawler . I use the same kind of mechanism to detect clickables here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM