[英]Jsoup: check if Element is before another (sorting)?
I need to parse the HTML text twice and collect different elements.我需要解析 HTML 文本两次并收集不同的元素。 In my first parse I collect them with eg
在我的第一次解析中,我用 eg 收集它们
final Document doc = Jsoup.parse(htmlStr.getContent());
ArrayList<Element> collectEls=new ArrayList<>();
final Elements referenceElements = doc.select("[data-coll='first-pass']");
// some more logic...
referenceElements.forEach(el -> collectEls.add(el));
and in a second round something like在第二轮类似
final Elements referenceElements = doc.select("[data-coll='second-pass']");
// some more logic...
referenceElements.forEach(el -> collectEls.add(el));
I canNOT collect them in one single pass.我不能一次性收集它们。 The algorithm would be far to complex.
该算法将非常复杂。 I need to sort
referenceElements
depending on their position in the HTML text, ie something like我需要根据 HTML 文本中的 position 对
referenceElements
进行排序,即类似于
referenceElements.sort((el1,el2) -> el1.compareTo(el2)); //would return true if el1 appears BEFORE el2
Honestly I have no clue how to compare them.老实说,我不知道如何比较它们。 I only found the
before
method but this is for inserting and doesn't perform any kind of checks.我只找到了
before
方法,但这是用于插入并且不执行任何类型的检查。 For the sake of example I would expect that el1
and el2
are distinct, ie no overlapping in sense one is the child of the other.举例来说,我希望
el1
和el2
是不同的,即在意义上没有重叠,一个是另一个的孩子。
I do not have a working Java compiler here, but I think you can start figuring it out using this information:我这里没有可用的 Java 编译器,但我认为您可以使用以下信息开始弄清楚它:
setTrackPosition
https://jsoup.org/apidocs/org/jsoup/parser/Parser.html#setTrackPosition(boolean)setTrackPosition
https://jsoup.org/apidocs/org/jsoup/parser/Parser.html#setTrackPosition(boolean)endSourceRange
method of Element to get the positoin of the closing HTML tag of your element in order to compare it to the other element's closing tag position. See https://jsoup.org/apidocs/org/jsoup/nodes/Element.html#endSourceRange()endSourceRange
方法获取元素的结束标记 HTML 的位置,以便将其与其他元素的结束标记 position 进行比较。请参阅https://jsoup.org/apidocs/org/jsoup/nodes/Element。 html#endSourceRange()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.