简体   繁体   中英

Regex for matching one url but not the other

Completely new programmer here having trouble with regular expressions despite trying various online regex testers. I'm working in Eclipse on an Android project I'm querying an openx ad server for a text ad and getting this in return:

var OX_abced445 = '';
OX_abced445 += "<"+"a href=\'http://the.server.url/openx/www/delivery/ck.php?oaparams=2__bannerid=29__zoneid=3__cb=e3efa8b703__oadest=http%3A%2F%2Fsomesite.com\'target=\'_blank\'>This is some sample text to test with!<"+"/a><"+"div id=\'beacon_e3efa8b703\'style=\'position: absolute; left: 0px; top: 0px; visibility:hidden;\'><"+"img src=\'http://the.server.url/openx/www/delivery/lg.php?bannerid=29&amp;campaignid=23&amp;zoneid=3&amp;loc=1&amp;cb=e3efa8b703\' width=\'0\'height=\'0\' alt=\'\' style=\'width: 0px; height: 0px;\' /><"+"/div>\n";
document.write(OX_abced445);

I need to extract the first href url but not the img src url so I figure I should have a regex that looks for everything between href=\\' and ' . I also need to extract the target text, ie. This is some sample text to test with! that is encapsulated between the _blank\\'> and <"+"/a> . I've found plenty of regexes dealing with extracting urls and such but have struggled to get one working in Eclipse with this particular case. Any assistance would be appreciated.

It is a very bad idea to try to parse JavaScript that generates HTML with regex. Use something like JSoup or Validator.nu for Java or Nokogiri for Ruby instead. If you must use a regex:

Plain regex:
^.*? href=\\'([^']+)\'[^>]*>([^<]*)<

or, in Java:

Pattern p = Pattern.compile("^.*? href=\\\\'([^']+)\\'[^>]*>([^<]*)<", 
                            Pattern.MULTILINE);
Matcher m = p.matcher(hideousString);
m.find();
// Now m.group(1) is the URL and m.group(2) is the text

will capture the href url in capture group 1 and the text in capture group 2, but that will break quickly if the site changes their response format.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM