[英]How to get all links on page with javascript navigation?
It's easy in classical HTML 在经典HTML中很容易
$('a').map(function(){return this.href}).toArray()
But, if navigation is done via JavaScript with something like: 但是,如果通过JavaScript完成导航,例如:
<a href='#' onclick='someFn()'>Some link</a>
It's impossible to know url without executing that someFn
function, and if you execute it - it won't return url, instead it navigate page away (I don't have control over someFn
function and don't know what's inside or how to change it). 没有执行someFn
函数就不可能知道url,如果你执行它 - 它不会返回url,而是导航页面(我无法控制someFn
函数,不知道里面是什么或者如何改变它)。
So, in order to get all N links from page you has to load all of those N pages. 因此,为了从页面获取所有N个链接,您必须加载所有这N个页面。 This is very slow and inefficient. 这非常缓慢且效率低下。
How it can be solved? 如何解决?
Possible solution - if it would be possible to intercept call to window.location
- problem solved. 可能的解决方案 - 如果可以拦截对window.location
调用 - 问题解决了。 You can just click all those links and check value of window.location
without loading new pages. 您只需单击所有这些链接并检查window.location
值,而无需加载新页面。 But I don't know if this is possible to do (I use phantomjs and it seems it can't do it). 但我不知道这是否可行(我使用phantomjs,它似乎无法做到)。
Note 注意
There are no URL in HTML, even after JavaScript is executed. 即使在执行JavaScript之后,HTML中也没有URL。 Yes, in some cases you can use browser emulator to execute JS and then parse HTML generated dynamically in browser. 是的,在某些情况下,您可以使用浏览器模拟器来执行JS,然后解析在浏览器中动态生成的HTML。 But it's not the case, I use browser emulator (phantomjs) but there are no URL & navigation in HTML even after JS executed, all navigation done in pure JS, with 但情况并非如此,我使用浏览器模拟器(phantomjs),但即使在JS执行后,HTML中也没有URL和导航,所有导航都是在纯JS中完成的,
<a onclick=tonsOfWeirdBlackBoxFunctionsYouCantChange>
If you are trying to parse an already executed source on a page, you will need to you regex functions to string search for those URL's. 如果您尝试解析页面上已执行的源,则需要使用正则表达式函数来搜索这些URL的字符串。
If you are trying to parse code at runtime for locations and save them to an array or something --- every function every object in JavaScript has a toString
function. 如果您尝试在运行时为位置解析代码并将它们保存到数组或其他东西--- JavaScript中的每个对象都具有toString
函数。
That is --- if you define your functions
as objects: 那就是---如果你将functions
定义为对象:
//Although you should really be using a parameter for this...
//...I'm trying to hold context with your use case.
var redirectToContact = function(){
window.location = "/contact.html";
}
You can redirectToContact.toString()
and run regex functions on that: 您可以redirectToContact.toString()
并在其上运行正则表达式函数:
Mabye something like: Mabye之类的:
var url = redirectToContact.toString();
console.log( url.match( 'window.location = "(.*)";' )[1] );
I think what you want to do is override the prototype for window.location
. 我想你想要做的是覆盖window.location
的原型。 This SO post explains how that could be done: Is it possible to override window.location.hostname in Javascript? 这篇SO帖子解释了如何做到这一点: 是否有可能在Javascript中覆盖window.location.hostname?
However, you have to inject a javascript snippet into each page that runs before any other scripts. 但是,您必须将javascript片段注入到在任何其他脚本之前运行的每个页面。 I have been working on similar functionality for the Crawljax web crawler . 我一直致力于Crawljax网络爬虫的类似功能。 I use the same kind of mechanism to detect clickables here . 我在这里使用相同的机制来检测可点击的内容 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.