简体   繁体   English

如何使用javascript导航获取页面上的所有链接?

[英]How to get all links on page with javascript navigation?

It's easy in classical HTML 在经典HTML中很容易

$('a').map(function(){return this.href}).toArray()

But, if navigation is done via JavaScript with something like: 但是,如果通过JavaScript完成导航,例如:

<a href='#' onclick='someFn()'>Some link</a>

It's impossible to know url without executing that someFn function, and if you execute it - it won't return url, instead it navigate page away (I don't have control over someFn function and don't know what's inside or how to change it). 没有执行someFn函数就不可能知道url,如果你执行它 - 它不会返回url,而是导航页面(我无法控制someFn函数,不知道里面是什么或者如何改变它)。

So, in order to get all N links from page you has to load all of those N pages. 因此,为了从页面获取所有N个链接,您必须加载所有这N个页面。 This is very slow and inefficient. 这非常缓慢且效率低下。

How it can be solved? 如何解决?

Possible solution - if it would be possible to intercept call to window.location - problem solved. 可能的解决方案 - 如果可以拦截对window.location调用 - 问题解决了。 You can just click all those links and check value of window.location without loading new pages. 您只需单击所有这些链接并检查window.location值,而无需加载新页面。 But I don't know if this is possible to do (I use phantomjs and it seems it can't do it). 但我不知道这是否可行(我使用phantomjs,它似乎无法做到)。

Note 注意

There are no URL in HTML, even after JavaScript is executed. 即使在执行JavaScript之后,HTML中也没有URL。 Yes, in some cases you can use browser emulator to execute JS and then parse HTML generated dynamically in browser. 是的,在某些情况下,您可以使用浏览器模拟器来执行JS,然后解析在浏览器中动态生成的HTML。 But it's not the case, I use browser emulator (phantomjs) but there are no URL & navigation in HTML even after JS executed, all navigation done in pure JS, with 但情况并非如此,我使用浏览器模拟器(phantomjs),但即使在JS执行后,HTML中也没有URL和导航,所有导航都是在纯JS中完成的,

<a onclick=tonsOfWeirdBlackBoxFunctionsYouCantChange>

If you are trying to parse an already executed source on a page, you will need to you regex functions to string search for those URL's. 如果您尝试解析页面上已执行的源,则需要使用正则表达式函数来搜索这些URL的字符串。

If you are trying to parse code at runtime for locations and save them to an array or something --- every function every object in JavaScript has a toString function. 如果您尝试在运行时为位置解析代码并将它们保存到数组或其他东西--- JavaScript中的每个对象都具有toString函数。

That is --- if you define your functions as objects: 那就是---如果你将functions定义为对象:

//Although you should really be using a parameter for this...
//...I'm trying to hold context with your use case.
var redirectToContact = function(){
  window.location = "/contact.html";
}

You can redirectToContact.toString() and run regex functions on that: 您可以redirectToContact.toString()并在其上运行正则表达式函数:

Mabye something like: Mabye之类的:

var url = redirectToContact.toString();
console.log( url.match( 'window.location = "(.*)";' )[1] );

I think what you want to do is override the prototype for window.location . 我想你想要做的是覆盖window.location的原型。 This SO post explains how that could be done: Is it possible to override window.location.hostname in Javascript? 这篇SO帖子解释了如何做到一点: 是否有可能在Javascript中覆盖window.location.hostname?

However, you have to inject a javascript snippet into each page that runs before any other scripts. 但是,您必须将javascript片段注入到在任何其他脚本之前运行的每个页面。 I have been working on similar functionality for the Crawljax web crawler . 我一直致力于Crawljax网络爬虫的类似功能。 I use the same kind of mechanism to detect clickables here . 我在这里使用相同的机制来检测可点击的内容

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用JavaScript将页面中的所有链接(href)设置为“#” - how to set all links(href) in the page to “#” with javascript 如何使用JavaScript抓取页面中的所有链接 - How to scrape all links in a page with javascript Javascript:如何更改页面中所有链接的样式? - Javascript: how to change the style of all links in a page? 如何在JavaScript中获取网站上所有链接的数组 - How to get an array of all links on a website in JavaScript 页面下载时如何从网站URL获取所有JavaScript源链接 - How to get all javascript source links from a website url when the page download 使用正则表达式对页面上的所有链接进行JavaScript - JavaScript all links on a page with regex 如何将 append 随机数(或以毫秒为单位的时间)发送到带有 JavaScript 的页面上的所有链接? - How to append a random number (or time in milliseconds) to all links on a page with JavaScript? 如何使网页上的所有链接调用javascript函数? - How to make all links on web page call javascript function? 如何获取HTML页面上所有链接的X,Y宽度和高度 - How to get X,Y width and height of all the links on an HTML page 您如何从带有节点puppeteer的页面获取所有链接? - How do you get all the links from a page with node puppeteer?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM