I need to parse the HTML text twice and collect different elements. In my first parse I collect them with eg
final Document doc = Jsoup.parse(htmlStr.getContent());
ArrayList<Element> collectEls=new ArrayList<>();
final Elements referenceElements = doc.select("[data-coll='first-pass']");
// some more logic...
referenceElements.forEach(el -> collectEls.add(el));
and in a second round something like
final Elements referenceElements = doc.select("[data-coll='second-pass']");
// some more logic...
referenceElements.forEach(el -> collectEls.add(el));
I canNOT collect them in one single pass. The algorithm would be far to complex. I need to sort referenceElements
depending on their position in the HTML text, ie something like
referenceElements.sort((el1,el2) -> el1.compareTo(el2)); //would return true if el1 appears BEFORE el2
Honestly I have no clue how to compare them. I only found the before
method but this is for inserting and doesn't perform any kind of checks. For the sake of example I would expect that el1
and el2
are distinct, ie no overlapping in sense one is the child of the other.
I do not have a working Java compiler here, but I think you can start figuring it out using this information:
setTrackPosition
https://jsoup.org/apidocs/org/jsoup/parser/Parser.html#setTrackPosition(boolean)endSourceRange
method of Element to get the positoin of the closing HTML tag of your element in order to compare it to the other element's closing tag position. See https://jsoup.org/apidocs/org/jsoup/nodes/Element.html#endSourceRange()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.