简体   繁体   English

刮除具有多个链接的页面的一般方法

[英]General approach for scraping pages with multiple links on them

Tampermonkey noob here. Tampermonkey菜鸟在这里。 So I have that wrote a script in tampermonkey that finds the table element on the on a page with a table full of links, gets all the href from it, and puts it into an array. 因此,我用tampermonkey编写了一个脚本,该脚本在包含链接的表的页面上查找表元素,并从中获取所有href,并将其放入数组中。

I need to actually go into those links and get some data, then come back to the table page, and go into the next link, etc, repeat until the last link. 我实际上需要进入这些链接并获取一些数据,然后返回到表格页面,并进入下一个链接,依此类推,重复直到最后一个链接。 I dont know how to achieve that since when it goes back to the table of links page from the first link, the script resets and just goes into the first link again. 我不知道该怎么实现,因为当它从第一个链接返回到链接表页面时,脚本将重置并再次进入第一个链接。

Thanks, 谢谢,

Edit: 编辑:

var urls= [];
for (var i= document.links.length; i-->0;){
if (document.links[i].hostname===location.hostname){
    urls.push(document.links[i].href);
}
}    

I want to go in to the links and open each link and get the data from it and come back, then go on to the next link, repeat. 我想进入链接并打开每个链接并从中获取数据并返回,然后继续下一个链接,重复。

I'm not really sure what you are asking for, but perhaps what you need to use is a loop? 我不确定您要的是什么,但是也许您需要使用一个循环? Can edit your question to provide source code? 可以编辑您的问题以提供源代码吗?

I think the solution to what you are asking is FOR every link in your array (I'm assuming you have an array of links), make a request (perhaps using jQuery's $.get/$.post or similar...) and then do something with the response. 我认为,您要解决的问题是针对数组中的每个链接(我假设您有一个链接数组),发出请求(也许使用jQuery的$ .get / $。post或类似方法...),然后对响应做一些事情。

If you want to do something with your responses, push them to an array, and then after your FOR loop is done, operate on the array. 如果要对响应进行某些操作,请将其推入数组,然后在完成FOR循环后,对该数组进行操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM