简体   繁体   中英

Jsoup: check if Element is before another (sorting)?

I need to parse the HTML text twice and collect different elements. In my first parse I collect them with eg

    final Document doc = Jsoup.parse(htmlStr.getContent());
    ArrayList<Element> collectEls=new ArrayList<>();
    final Elements referenceElements = doc.select("[data-coll='first-pass']");
    // some more  logic...
    referenceElements.forEach(el -> collectEls.add(el));

and in a second round something like

    final Elements referenceElements = doc.select("[data-coll='second-pass']");
    // some more  logic...
    referenceElements.forEach(el -> collectEls.add(el));

I canNOT collect them in one single pass. The algorithm would be far to complex. I need to sort referenceElements depending on their position in the HTML text, ie something like

referenceElements.sort((el1,el2) -> el1.compareTo(el2)); //would return true if el1 appears BEFORE el2

Honestly I have no clue how to compare them. I only found the before method but this is for inserting and doesn't perform any kind of checks. For the sake of example I would expect that el1 and el2 are distinct, ie no overlapping in sense one is the child of the other.

I do not have a working Java compiler here, but I think you can start figuring it out using this information:

  1. Before you parse, turn on tracking positions of HTML nodes: setTrackPosition https://jsoup.org/apidocs/org/jsoup/parser/Parser.html#setTrackPosition(boolean)
  2. use the endSourceRange method of Element to get the positoin of the closing HTML tag of your element in order to compare it to the other element's closing tag position. See https://jsoup.org/apidocs/org/jsoup/nodes/Element.html#endSourceRange()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM