简体   繁体   English

HTML中的正则表达式以提取特定的href

[英]Regular expression in html to extract specific href

I would like to fetch specific data in JSON Data : every links in href in this markup <div id='gallery-1' 我想在JSON数据中获取特定数据:此标记中href中的每个链接<div id='gallery-1'

For example with my JSON Data : 例如我的JSON数据:

<p><strong style=\"font-size: 13px;\">22nd March</strong></p>\n
<p>Swell is 3 foot and clean but wind swing south west later. Get on the early</p>\n
<p><span id=\"more-113\"></span></p>\n
<p>High tide: 1922 2.6m    <span style=\"color: #ff0000;\"> <a href=\"http://www.bundoransurfco.com/webcam/\">
<strong>CLICK HERE FOR LIVE PEAK WEBCAM</strong></a></span></p>\n
<p>Low Tide: 1249 -0.1m</p>\n<p><b>3 day forecast to March 23rd</b></p>\n
<p>Looks like a fun few days with light winds and a long period swell.</p>\n\n\t\t
<style type='text/css'>\n\t\t\t#gallery-1 {\n\t\t\t\tmargin: auto;\n\t\t\t}\n\t\t\t
#gallery-1 .gallery-item {\n\t\t\t\tfloat: left;\n\t\t\t\tmargin-top: 10px;\n\t\t\t\t
text-align: center;\n\t\t\t\twidth: 50%;\n\t\t\t}\n\t\t\t#gallery-1 img {\n\t\t\t\t
border: 2px solid #cfcfcf;\n\t\t\t}\n\t\t\t
#gallery-1 .gallery-caption {\n\t\t\t\t
margin-left: 0;\n\t\t\t}\n\t\t\t
/* see gallery_shortcode() in wp-includes/media.php */\n\t\t</style>\n\t\t
<div id='gallery-1' class='gallery galleryid-113 gallery-columns-2 gallery-size-medium'>
<dl class='gallery-item'>\n\t\t\t<dt class='gallery-icon portrait'>\n\t\t\t\t
<a rel=\"prettyPhoto[gallery-113]\" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n.jpg'>
<img width=\"225\" height=\"300\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n-225x300.jpg\" 
class=\"attachment-medium colorbox-113 \" alt=\"10411096_10152611456607000_886839954460588268_n\" /></a>\n\t\t\t
</dt></dl>\n\t\t\t
<br style='clear: both' />\n\t\t</div>\n\n
<p><a href=\"http://www.bundoransurfco.com/webcam/\"> </a></p>\n
<h1> Wind Charts</h1>\n<p><a href=\"http://www.windguru.cz/int/index.php?sc=103244\">
<img class=\"size-thumbnail wp-image-747 alignleft\" title=\"wind guru\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/wind-guru-67x68.jpg\" alt=\"\" width=\"67\" height=\"68\" /></a> <a href=\"http://www.xcweather.co.uk/\"><img class=\"alignnone size-thumbnail wp-image-749\" title=\"xcweathersmall\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/xcweathersmall2-67x68.jpg\" alt=\"\" width=\"67\" height=\"68\" /></a>       <a href=\"http://www.buoyweather.com/wxnav6.jsp?region=UK&program=nww3BW1&grb=nww3&latitude=55.0&longitude=-8.75&zone=0&units=e\"><img class=\"alignnone size-thumbnail wp-image-750\" title=\"buoy weather\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/buoy-weather-67x68.jpg\" alt=\"\" width=\"67\" height=\"68\" /></a> <a href=\"http://www.windguru.cz/int/index.php?sc=103244\">Wind Guru</a>       <a href=\"http://www.xcweather.co.uk/\">XC Weather</a>       <a href=\"http://www.buoyweather.com/wxnav6.jsp?region=UK&program=nww3BW1&grb=nww3&latitude=55.0&longitude=-8.75&zone=0&units=e\">Buoy Weather</a></p>\n<h1>Swell Charts</h1>\n<p><a href=\"http://magicseaweed.com/Bundoran-Surf-Report/50/\"><img class=\"alignnone size-thumbnail wp-image-753\" title=\"msw logo\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/msw-logo-67x43.jpg\" alt=\"\" width=\"75\" height=\"43\" /></a>             <a href=\"http://magicseaweed.com/UK-Ireland-MSW-Surf-Charts/1/\"><img class=\"alignnone size-thumbnail wp-image-754\" title=\"magicseaweedwamchart\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/magicseaweedwamchart1-67x68.png\" alt=\"\" width=\"67\" height=\"68\" /></a>       <a href=\"http://www.marine.ie/Home/site-area/data-services/marine-forecasts/wave-forecasts\"><img class=\"alignnone wp-image-755 size-thumbnail\" title=\"marine institute irish bouy data\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/marine-institute-irish-bouy-data-67x42.jpg\" alt=\"\" width=\"67\" height=\"42\" /></a>                 <a href=\"http://magicseaweed.com/Bundoran-Surf-Report/50/\">Magic Seaweed</a>      <a href=\"http://magicseaweed.com/UK-Ireland-MSW-Surf-Charts/1/\">MSM WAM</a>          <a href=\"http://www.marine.ie/Home/site-area/data-services/marine-forecasts/wave-forecasts\">Marine Institute</a></p>\n<h1>Pressure, Weather, Tides</h1>\n<p><a href=\"http://news.bbc.co.uk/weather/forecast/13000\"><img class=\"alignnone size-thumbnail wp-image-756\" title=\"bbc pressure\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/bbc-pressure-67x68.jpg\" alt=\"\" width=\"67\" height=\"68\" /></a>          <a href=\"http://www.met.ie/\"><img class=\"alignnone size-thumbnail wp-image-759\" title=\"met eireann\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/met-eireann-67x68.jpg\" alt=\"\" width=\"67\" height=\"68\" /></a>            <a href=\"http://news.bbc.co.uk/weather/forecast/13000\">BBC Pressure</a>      <a href=\"http://www.met.ie/\">Met Eireann</a>      <a href=\"http://www.irishtimes.com/weather/tides.html\">Irish Tide Tables</a></p>\n

Fetch only : http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n.jpg 仅获取: http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n.jpg : http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n.jpg

I used <a.+?href=\\"([^\\"]+) before for fetching every href in <a> markup, but that's not what I want ... 我之前使用<a.+?href=\\"([^\\"]+)来获取<a>标记中的每个href ,但这不是我想要的...

Assuming your div is a string and there is only one href, you can use this code instead of a regex to get the start and stop locations of the href. 假设div是一个字符串,并且只有一个href,则可以使用此代码代替正则表达式来获取href的开始和结束位置。

    NSRange range = [divString rangeOfString:@"href"]; // start
    [divString rangeOfString:@">" options:0 range:NSMakeRange(range.location, 100)]; // end (if your href is long you can replace 100 with something greater)

Then use divString substringWithRange: to get the portion you are interested in 然后使用divString substringWithRange:获得您感兴趣的部分

Here's the solution based on answer of Alex. 这是基于亚历克斯回答的解决方案。 Works for multiple hrefs in single string: 适用于单个字符串中的多个href:

NSString *target = @"<div id='gallery-1' class='gallery galleryid-113 gallery-columns-2 gallery-size-medium'><dl class='gallery-item'>\n\t\t\t<dt class='gallery-icon portrait'>\n\t\t\t\t<a rel=\"prettyPhoto[gallery-113]\" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n.jpg'><img width=\"225\" height=\"300\" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n-225x300.jpg' class=\"attachment-medium colorbox-113 \" alt=\"10411096_10152611456607000_886839954460588268_n\" /></a>\n\t\t\t</dt></dl>\n\t\t\t<br style='clear: both' />\n\t\t</div>";
NSMutableArray *hrefs = [NSMutableArray array];
NSRange hrefRange = NSMakeRange(0, 0);
while (hrefRange.location != NSNotFound){
    hrefRange = [target rangeOfString:@"href='"
                                      options:0
                                        range:NSMakeRange(hrefRange.location, target.length - (hrefRange.location + hrefRange.length))];
    if (hrefRange.location == NSNotFound) {
        NSLog(@"Thats all");
        continue;
    }
    NSRange endRange = [target rangeOfString:@"'"
                                     options:0
                                       range:NSMakeRange(hrefRange.location + hrefRange.length, target.length - (hrefRange.location + hrefRange.length))];
    NSString *href = [target substringWithRange:NSMakeRange((hrefRange.location+hrefRange.length), endRange.location - (hrefRange.location + hrefRange.length))];
    [hrefs addObject:href];
    hrefRange.location = hrefRange.location+hrefRange.length;
}

This implementation, as you can see, is sensitive to quotes (single- or double-quoted href value). 如您所见,此实现对引号(单引号或双引号href值)敏感。 PS May look kinda messy, its fast-coded and tested. PS可能看起来有些混乱,它经过快速编码和测试。

EDIT: Here's also a variant with regular expression, but works only with a tag, and also be careful with quotes: 编辑:这也是带有正则表达式的变体,但仅与标签一起使用 ,并且在使用引号时要格外小心:

NSError *error;
NSString *target = @"<div id='gallery-1' class='gallery galleryid-113 gallery-columns-2 gallery-size-medium'><dl class='gallery-item'>\n\t\t\t<dt class='gallery-icon portrait'>\n\t\t\t\t<a rel=\"prettyPhoto[gallery-113]\" href=\"http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n.jpg\"><img width=\"225\" height=\"300\" href=\"http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n-225x300.jpg\" class=\"attachment-medium colorbox-113 \" alt=\"10411096_10152611456607000_886839954460588268_n\" /></a>\n\t\t\t</dt></dl>\n\t\t\t<br style='clear: both' />\n\t\t</div>";
NSRegularExpression *regEx = [NSRegularExpression regularExpressionWithPattern:@"<[a|img]\\s+(?:[^>]*?\\s+)?href=\"([^\"]*)\""
                                                                       options:0
                                                                         error:&error];
NSArray *array = [regEx matchesInString:target
                                options:0
                                  range:NSMakeRange(0, target.length)];
for (NSTextCheckingResult *match in array){
    NSRange range = [match rangeAtIndex:1];
    NSString *result = [target substringWithRange:range];
    NSLog(@"HREF = %@", result);
}

I also edited first variant to save all hrefs into array. 我还编辑了第一个变量,以将所有href保存到数组中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM