简体   繁体   中英

Jsoup selector - How to select the first 5 <p> elements inside <div> element

There is a bunch of html elements as following:

<div class="abcdefghijk">
   <p>a</p>
   <p>b</p>
   <p>c</p>
   <p>d</p>
   <p>e</p>
   <p>f</p>
   <p>h</p>
   <p>i</p>
   <p>j</p>
   <p>k</p>
</div>

I want to select the first 5 <p> elements. Please help!

From https://jsoup.org/cookbook/extracting-data/selector-syntax we can learn about:

:lt(n) : find elements whose sibling index (ie its position in the DOM tree relative to its parent) is less than n ; eg td:lt(3)

So based on your example all you need is select("div.abcdefghijk p:lt(5)") .

Demo:

String html = " <div class=\"abcdefghijk\">\r\n" + 
        "   <p>a</p>\r\n" + 
        "   <p>b</p>\r\n" + 
        "   <p>c</p>\r\n" + 
        "   <p>d</p>\r\n" + 
        "   <p>e</p>\r\n" + 
        "   <p>f</p>\r\n" + 
        "   <p>h</p>\r\n" + 
        "   <p>i</p>\r\n" + 
        "   <p>j</p>\r\n" + 
        "   <p>k</p>\r\n" + 
        "</div>";

Document doc = Jsoup.parse(html);
Elements elements = doc.select("div.abcdefghijk p:lt(5)");
for (Element el : elements){
    System.out.println(el);
}

Output:

<p>a</p>
<p>b</p>
<p>c</p>
<p>d</p>
<p>e</p>

To achieve expected result , use nth child selector

:nth-child(-n+5)
select("div.abcdefghijk :nth-child(-n+5)")

https://jsoup.org/apidocs/org/jsoup/select/Selector.html

If you want to select all of them anyway, but do something special with the first 5, use Elements#subList(fromIndex, toIndex) (inherited from ArrayList ):

Returns a view of the portion of this list between the specified fromIndex , inclusive, and toIndex , exclusive.

String html = 
    "<div class=\"abcdefghijk\">" +
        "<p>a</p><p>b</p><p>c</p><p>d</p><p>e</p>" + // get these
        "<p>f</p><p>h</p><p>i</p><p>j</p><p>k</p>" +
    "</div>";
Document doc = Jsoup.parse(html);
Elements paras = doc.select("div.abcdefghijk p");
for (Element el : paras.subList(0, Math.min(5, paras.size())) {
    System.out.println(el);
}

Output:

<p>a</p>
<p>b</p>
<p>c</p>
<p>d</p>
<p>e</p>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM