简体   繁体   中英

Fetching URL from HTMLData with List Android Studio

I'm near that I want, but I'm blocked... I have HTML Data in my String contentString : Log.i(TAG, "ALL URL : " + contentString); :

<p><b>14th April</b></p>
<p>The wind is south west with 4 to 5 foot of swell at the peak. Streedagh will probably be the best beach break.</p>
<p><span id="more-113"></span></p>
<p>High tide: 1250  3.1m    <span style="color: #ff0000;"> <a href="http://www.bundoransurfco.com/webcam/"><strong>CLICK HERE FOR LIVE PEAK WEBCAM</strong></a></span></p>
<p>Low Tide: 1854 1.4m</p>
<p></p>
<p></p>
<style type='text/css'>
#gallery-1 {
margin: auto;
}
#gallery-1 .gallery-item {
float: left;
margin-top: 10px;
text-align: center;
width: 50%;
}
#gallery-1 img {
border: 2px solid #cfcfcf;
}
#gallery-1 .gallery-caption {
margin-left: 0;
}
/* see gallery_shortcode() in wp-includes/media.php */
</style>
<div id='gallery-1' class='gallery galleryid-113 gallery-columns-2 gallery-size-thumbnail'><dl class='gallery-item'>
<dt class='gallery-icon portrait'>
<a rel="prettyPhoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n.jpg'><img width="67" height="68" src="http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n-67x68.jpg" class="attachment-thumbnail colorbox-113 " alt="11149460_10152656389992000_7842452340110509403_n" /></a>
</dt></dl><dl class='gallery-item'>
<dt class='gallery-icon portrait'>
<a rel="prettyPhoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-April.jpg'><img width="67" height="68" src="http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-April-67x68.jpg" class="attachment-thumbnail colorbox-113 " alt="14th April" /></a>
</dt></dl><br style="clear: both" />
</div>
<p></p>
<p><b>3 day forecast to April 13th</b></p>
<p>Solid swell and onshore winds for the weekend. Best spots will be Rossnowlagh and Streedagh. Bundoran beaches and reefs will be blown out.</p>
<h1> Wind Charts</h1>
<p><a href="http://www.windguru.cz/int/index.php?sc=103244"><img class="size-thumbnail wp-image-747 alignleft" title="wind guru" src="http://www.bundoransurfco.com/wp-content/uploads/2010/12/wind-guru-67x68.jpg" alt="" width="67" height="68" /></a> <a href="http://www.xcweather.co.uk/"><img class="alignnone size-thumbnail wp-image-749" title="xcweathersmall" src="http://www.bundoransurfco.com/wp-content/uploads/2010/12/xcweathersmall2-67x68.jpg" alt="" width="67" height="68" /></a>       <a href="http://www.buoyweather.com/wxnav6.jsp?region=UK&program=nww3BW1&grb=nww3&latitude=55.0&longitude=-8.75&zone=0&units=e"><img class="alignnone size-thumbnail wp-image-750" title="buoy weather" src="http://www.bundoransurfco.com/wp-content/uploads/2010/12/buoy-weather-67x68.jpg" alt="" width="67" height="68" /></a> <a href="http://www.windguru.cz/int/index.php?sc=103244">Wind Guru</a>       <a href="http://www.xcweather.co.uk/">XC Weather</a>       <a href="http://www.buoyweather.com/wxnav6.jsp?region=UK&program=nww3BW1&grb=nww3&latitude=55.0&longitude=-8.75&zone=0&units=e">Buoy Weather</a></p>

I would like to fetch only href's URL with <a rel="prettyPhoto[gallery-113]" ...> (two in my example)

For that, I'm using Pattern :

Pattern pattern = Pattern.compile("<a rel=\"prettyPhoto\\[gallery-113\\]\"[^>]*>");
        Matcher matcher = pattern.matcher(contentString);
        List<String> urlWithRel = new ArrayList<String>();
        String lastString;
        List<String> imagesUrl = null;
        while (matcher.find()) {
            urlWithRel.add(matcher.group());
            lastString = urlWithRel.toString();
        }
        Log.i(TAG, "url with rel : " + urlWithRel);
        Log.i(TAG, "final url : " + imagesUrl);
        Log.i(TAG, "List size : " + imagesUrl.size());

With the first regex I can have the two markup I need :

<a rel="prettyPhoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n.jpg'>, <a rel="prettyPhoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-April.jpg'>

Now I want to store only href's URL, I find a regex that works for getting only url : (?<=href=).*(?=>)

But the problem is I can't use another regex on List... and if I create a string for making the regex, the regex is working just on the first object...

Here is my final code (doesn't work) :

Pattern pattern = Pattern.compile("<a rel=\"prettyPhoto\\[gallery-113\\]\"[^>]*>");
Matcher matcher = pattern.matcher(contentString);
List<String> urlWithRel = new ArrayList<String>();
String lastString;
List<String> imagesUrl = null;
while (matcher.find()) {
    urlWithRel.add(matcher.group());
    lastString = urlWithRel.toString();
    Pattern lastPattern = Pattern.compile("(?<=href=).*(?=>)");
    Matcher lastMatcher = lastPattern.matcher(lastString);
    imagesUrl = new ArrayList<String>();
    while (lastMatcher.find()) {
        imagesUrl.add(lastMatcher.group());
    }
}
Log.i(TAG, "url with rel : " + urlWithRel);
Log.i(TAG, "final url : " + imagesUrl);
Log.i(TAG, "List size : " + imagesUrl.size());

Returns :

final url : ['http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n.jpg'>, <a rel="prettyPhoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-April.jpg']

If your are willing to use the jsoup library this is the snippet you should use:

ArrayList<Url> urls=new ArrayList<Url>();
Document doc=Jsoup.parse(contentString);
Elements els=doc.select("a[href]");
for(Element el : els)
    if(el.attr("rel").equals("prettyPhoto[gallery-113]"))
       urls.add(new Url(el.attr("href")));

And remember to handle MalformedURLException for the Url object.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM