简体   繁体   English

Jsoup:获取某个元素之前的所有元素/删除某个元素之后的所有元素

[英]Jsoup: get all elements before a certain element / remove all elements after a certain element

Suppose I have html like this:假设我有这样的 html:

<div class="pets">
  <div class="pet">...</div>
  <div class="pet">...</div>
  <div class="pet">...</div>
  <div class="pet">...</div>
  <div class="friends-pets">Your friends have these pets:</div>
  <div class="pet">...</div>
  <div class="pet">...</div>
  <div class="pet">...</div>
  <div class="pet">...</div>
  <div class="pet">...</div>
  <div class="pet">...</div>
</div>

I want to only get <div class="pet"> that come before <div class="friends-pets"> .我只想获得<div class="pet"> <div class="friends-pets">之前的<div class="friends-pets"> <div class="pet"> <div class="friends-pets"> Is there a way to do it with Jsoup?有没有办法用 Jsoup 做到这一点? I know I can get all pets like this:我知道我可以得到所有这样的宠物:

Element petsWrapper = document.selectFirst(".pets");
Elements pets = petsWrapper.select(".pet");

but that would include the extra pets too.但这也包括额外的宠物。 I was wondering if I could only select the above pets or just remove the below pets and then use that code?我想知道我是否只能选择上面的宠物或只删除下面的宠物然后使用该代码?

There is a very simple way you can do it with a single selector:有一种非常简单的方法可以使用单个选择器来完成:

.pet:not(.friends-pets ~ .pet)

This works by using the :not() selector with .friends-pets ~ .pet finding each div after the .friends-pets class.这是通过使用:not() 选择器.friends-pets ~ .pet.friends-pets ~ .pet .friends-pets类之后找到每个 div 来工作的。 It then excludes those from the rest of the .pet class matches.然后它从.pet类匹配的其余部分中排除那些。

See an working online example here: try.jsoup在此处查看一个有效的在线示例: try.jsoup

Explanation in comments:评论中的解释:

Element petsWrapper = document.selectFirst(".pets");
Elements pets = petsWrapper.select(".pet");
// select middle element
Element middleElement = petsWrapper.selectFirst(".friends-pets");
// remove from "pets" every element that comes after the middle element
pets.removeAll(middleElement.nextElementSiblings());
System.out.println(pets);

I'm gonna check out Krystian's answer, but having tried to solve this myself, I've come up with this one:我要查看 Krystian 的回答,但在尝试自己解决这个问题后,我想出了这个:

//get all divs
Elements divElements = doc.select("div");
//valid pet divs will be here
List<Element> pets = new ArrayList<>();
for (Element divElement: divElements)  {
    if (divElement.className().equalsIgnoreCase("friends-pets")) {
       //invalid div, the cycle stops here 
       break;
     }

     if (divElement.className().contains("pet"))  {
        //if there has been no invalid div so far, adding a pet
        pets.add(divElement);
     }
}

If you think there's something wrong with this answer, please let me know.如果您认为这个答案有问题,请告诉我。 If you have reasons for why I should use one of the current two answers over the other, please comment too!如果您有理由解释为什么我应该使用当前两个答案中的一个而不是另一个,也请发表评论!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM