简体   繁体   中英

Get all URLs from an external URL

I'm trying to get all URLs from a page using jQuery to call them later on using $.get() . If they were on the same page as the script is included in, it would be no problem calling something like

var links = document.getElementsByTagName("a");
for(var i=0; i<links.length; i++) {
    alert(links[i].href);
}

In this case I'd just use alert to check that the links were actually parsed. But how can I do the same thing with an URL that is not the current page? Any help would be appreciated. Maybe I'm missing something ridiculously simple but I am really stumped when it comes to anything JavaScript/JQuery related.

Blatantly copying this answer by Nick Craver (go upvote it), but modifying it for your use case:

$.get("page.html", function(data) {
  var data = $(data);
  var links = data.find('a');
  //do stuff with links
});

Note that this will only work if the page you're hitting is set up for cross-origin request. If it isn't, you'll need to do the same with a Dom-parser from a backend server. Nodejs has some great options there, including jsDom .

You will have to get the other page via an HTTP request ( $.get in JQuery achieves this), and then either go about converting that HTML into a DOM that JQuery can then traverse and find the <a> tags for you, or use another method such as a regular expression to find all the links within the returned markup.

edit : Probably don't actually use a regex unless you have a guaranteed HTML format and can guarantee the format of all <a> tags on the page. By this point, it's probably just easier to parse the HTML for real.

Collect the current page URL using window.location.href and then match the same with the href of other "a" tags in the loop

var links = document.getElementsByTagName("a");
var thisHref = window.location.href;
for(var i=0; i<links.length; i++) {
    templink = links[i].href;
    if (templink != thisHref){// if the link is not same with current page URL
        alert(links[i].href);
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM