简体   繁体   中英

JSOUP Select “<br/>”

Hi guys I'm trying to select the tag "< br / >" in the HTML file and it's not working here is source of the site:

</div><p><a href="http://www.pinoyfitness.com/wp-content/uploads/2014/03/sofitel-manila-half-marathon-2014-poster.jpg"><img src="http://www.pinoyfitness.com/wp-content/uploads/2014/03/sofitel-manila-half-marathon-2014-poster-540x783.jpg" alt="sofitel-manila-half-marathon-2014-poster" width="540" height="783" class="aligncenter size-medium wp-image-32747" /></a></p>
<p>Introducing the Manila Half Marathon happening on August 17, 2014 at the SM Mall of Asia Grounds. This race is for the benefit of the children of <a href="http://www.virlanie.org/" rel="nofollow" target="_blank">Virlanie</a></p>
 <p><font size="3"><strong>Sofitel Manila Half-Marathon 2014</strong></font><br />
August 17, 2014 @ 3AM<br />
SM Mall of Asia<br />
5K/10K/21K<br />
Organizer: RunRio</p>
<p><strong>Registration Fees:</strong><br />
21K &#8211; P950<br />
10K &#8211; P850<br />
5K &#8211; P750</p>

here is my work so far:

doc = Jsoup.connect("http://www.pinoyfitness.com/2014/03/manila-half-marathon-august-17-2014/").timeout(0).get();
            Element bod = doc.body();
            Elements info = bod.select("br");
            String textString = info.text();

            System.out.println(textString);

I'm trying to retrieve the html code with the " < br / >" so that I can easily split them and format them.

but it when I select the element "P" it prints all the texts not including "< br / >" like this "Introducing the Manila Half Marathon happening on August 17, 2014 at the SM Mall of Asia Grounds. This race is for the benefit of the children of Virlanie Sofitel Manila Half-Marathon 2014 August 17, 2014 @ 3AM SM Mall of Asia 5K/10K/21K Organizer"

I'm new at JSOUP so please go easy on me if a have a newbee error or something like that. Thanks in advance.

If you want to preserve the <br/> tags in the parsed content, a somewhat simplistic solution to your problem would be to replace all <br/> tags in the original HTML code with text placeholders (a handy regexp to do it from here ):

html.replaceAll("(?i)<br[^>]*>", "br2n")

Then you could do textString.split("br2n") if this is what you've been trying to achieve.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM