简体   繁体   English

JSOUP选择“ <br/> ”

[英]JSOUP Select “<br/>”

Hi guys I'm trying to select the tag "< br / >" in the HTML file and it's not working here is source of the site: 嗨,大家好,我试图在HTML文件中选择标签“ <br />”,但此处不起作用,是网站的来源:

</div><p><a href="http://www.pinoyfitness.com/wp-content/uploads/2014/03/sofitel-manila-half-marathon-2014-poster.jpg"><img src="http://www.pinoyfitness.com/wp-content/uploads/2014/03/sofitel-manila-half-marathon-2014-poster-540x783.jpg" alt="sofitel-manila-half-marathon-2014-poster" width="540" height="783" class="aligncenter size-medium wp-image-32747" /></a></p>
<p>Introducing the Manila Half Marathon happening on August 17, 2014 at the SM Mall of Asia Grounds. This race is for the benefit of the children of <a href="http://www.virlanie.org/" rel="nofollow" target="_blank">Virlanie</a></p>
 <p><font size="3"><strong>Sofitel Manila Half-Marathon 2014</strong></font><br />
August 17, 2014 @ 3AM<br />
SM Mall of Asia<br />
5K/10K/21K<br />
Organizer: RunRio</p>
<p><strong>Registration Fees:</strong><br />
21K &#8211; P950<br />
10K &#8211; P850<br />
5K &#8211; P750</p>

here is my work so far: 到目前为止,这是我的工作:

doc = Jsoup.connect("http://www.pinoyfitness.com/2014/03/manila-half-marathon-august-17-2014/").timeout(0).get();
            Element bod = doc.body();
            Elements info = bod.select("br");
            String textString = info.text();

            System.out.println(textString);

I'm trying to retrieve the html code with the " < br / >" so that I can easily split them and format them. 我正在尝试使用“ <br />”检索html代码,以便可以轻松拆分它们并设置其格式。

but it when I select the element "P" it prints all the texts not including "< br / >" like this "Introducing the Manila Half Marathon happening on August 17, 2014 at the SM Mall of Asia Grounds. This race is for the benefit of the children of Virlanie Sofitel Manila Half-Marathon 2014 August 17, 2014 @ 3AM SM Mall of Asia 5K/10K/21K Organizer" 但是当我选择元素“ P”时,它会打印出所有文字,但不包括“ <br />”,例如“ 2014年8月17日在亚洲购物中心SM购物中心举行的马尼拉半程马拉松比赛”。 Virlanie Sofitel Manila Half Marathon 2014的孩子们的利益2014年8月17日@ 3AM SM亚洲商城5K / 10K / 21K组织者”

I'm new at JSOUP so please go easy on me if a have a newbee error or something like that. 我是JSOUP的新手,所以如果遇到newbee错误或类似的错误,请放轻松。 Thanks in advance. 提前致谢。

If you want to preserve the <br/> tags in the parsed content, a somewhat simplistic solution to your problem would be to replace all <br/> tags in the original HTML code with text placeholders (a handy regexp to do it from here ): 如果您想在解析后的内容中保留<br/>标记,一种稍微简化的解决方案是将原始HTML代码中的所有<br/>标记替换为文本占位符(方便的regexp从此处开始 ):

html.replaceAll("(?i)<br[^>]*>", "br2n")

Then you could do textString.split("br2n") if this is what you've been trying to achieve. 然后,如果您一直在尝试这样做,则可以执行textString.split("br2n")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM