简体   繁体   中英

javascript regex match url

I want to get urls from a bing search. I get the html, and when I do this regex /<h2><a href="(.*?)"/g it gives me :

["<h2><a href="https://www.test.com/"", "<h2><a href="http://fr.wikipedia.org/wiki/Test_(informatique)"", "<h2><a href="http://www.speedtest.net/"", "<h2><a href="http://test.psychologies.com/"", "<h2><a href="http://www.thefreedictionary.com/test"", "<h2><a href="http://fr.wikipedia.org/wiki/Test"", "<h2><a href="http://www.wordreference.com/enfr/test"", "<h2><a href="http://www.sedecouvrir.fr/"", "<h2><a href="http://www.jeuxvideo.com/tests.htm"", "<h2><a href="http://en.wikipedia.org/wiki/Test""]

For js code, I used match

html.match(/<h2><a href="(.*?)"/g);

I only want the urls. The html is here: http://www.bing.com/search?q=test . I've already searched the whole day, and I think maybe I have to use group?

Use Array.map to iterate over the list of html elements and then execute a given regular expression to get the link using group.

"use strict";

var links = ['<h2><a href="https://www.test.com/"',
 '<h2><a href="http://fr.wikipedia.org/wiki/Test_(informatique)"', 
 '<h2><a href="http://www.speedtest.net/"', 
 '<h2><a href="http://test.psychologies.com/"',
 '<h2><a href="http://www.thefreedictionary.com/test"',
 '<h2><a href="http://fr.wikipedia.org/wiki/Test"',
 '<h2><a href="http://www.wordreference.com/enfr/test"',
 '<h2><a href="http://www.sedecouvrir.fr/"',
 '<h2><a href="http://www.jeuxvideo.com/tests.htm"',
 '<h2><a href="http://en.wikipedia.org/wiki/Test"'];

var result = links.map(function (link) {
  return /<h2><a href="(.*?)"/.exec(link)[1];
});

console.log(result);

That is an array. You need something like this. Also you need groups.

var urls = html.map(function(str){
   return str.replace(/.*href="([^"]+).*/, "$1");
});

If this is being done within a browser, there's really no need to try to use a regex.

var myNodeList= document.getElementsByTagName('a'); 
var i;
for (var i = 0; i < myNodeList.length; ++i) {
    var anchor = myNodeList[i];  
    console.debug(anchor.href);
}

But as hinted in the comments, if you really want to use regexes, all you need to do is iterate over the results like you see in How can I match multiple occurrences with a regex in JavaScript similar to PHP's preg_match_all()? In particular, note the lines:

while (match = re.exec(url)) {
     params[decode(match[1])] = decode(match[2]);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM