如何使用JSoup提取html代码中的特定文本

Question

I have a website where I want to extract some data from. 我有一个网站，我想从中提取一些数据。 I want to extract the 8a on the second line (a-element) with JSoup. 我想用JSoup在第二行（a元素）中提取8a。 I can not use Regex because sometimes 8a is just 2 or 7c+ and these same values can be in the text in between the a tags as well. 我不能使用Regex，因为有时8a只是2或7c +，并且这些相同的值也可以在a标记之间的文本中。 Ideas? 想法？

<div class="vsr"> 
 <a href="/91.1/303535.html">L'Américain (intégral)</a> 8a 
 <span class="ag">7c+</span> 
 <em>Tony Fouchereau</em> 
 <span class="btype">traversée d-g, surplomb, départ assis</span> 
 <span class="glyphicon glyphicon-camera" aria-hidden="true"></span> 
 <span class="glyphicon glyphicon-film" aria-hidden="true"></span> 
</div>

Answer 1

You can use Jsoup css selectors to extract specific information. 您可以使用Jsoup css选择器提取特定信息。

https://jsoup.org/cookbook/extracting-data/selector-syntax https://jsoup.org/cookbook/extracting-data/selector-syntax

@Test
public void extract8a() {
    Document doc = Jsoup.parse("<div class=\"vsr\"> \n" +
            " <a href=\"/91.1/303535.html\">L'Américain (intégral)</a> 8a \n" +
            " <span class=\"ag\">7c+</span> \n" +
            " <em>Tony Fouchereau</em> \n" +
            " <span class=\"btype\">traversée d-g, surplomb, départ assis</span> \n" +
            " <span class=\"glyphicon glyphicon-camera\" aria-hidden=\"true\"></span> \n" +
            " <span class=\"glyphicon glyphicon-film\" aria-hidden=\"true\"></span> \n" +
            "</div>");
    System.out.println(doc.select("div.vsr").first().ownText());
}

如何使用JSoup提取html代码中的特定文本

问题描述

1 个解决方案

解决方案1
0 2019-01-30 11:51:50

如何使用JSoup提取html代码中的特定文本

问题描述

1 个解决方案

解决方案1 0 2019-01-30 11:51:50

解决方案1
0 2019-01-30 11:51:50