简体   繁体   English

Jsoup:检查元素是否在另一个元素之前(排序)?

[英]Jsoup: check if Element is before another (sorting)?

I need to parse the HTML text twice and collect different elements.我需要解析 HTML 文本两次并收集不同的元素。 In my first parse I collect them with eg在我的第一次解析中,我用 eg 收集它们

    final Document doc = Jsoup.parse(htmlStr.getContent());
    ArrayList<Element> collectEls=new ArrayList<>();
    final Elements referenceElements = doc.select("[data-coll='first-pass']");
    // some more  logic...
    referenceElements.forEach(el -> collectEls.add(el));

and in a second round something like在第二轮类似

    final Elements referenceElements = doc.select("[data-coll='second-pass']");
    // some more  logic...
    referenceElements.forEach(el -> collectEls.add(el));

I canNOT collect them in one single pass.我不能一次性收集它们。 The algorithm would be far to complex.该算法将非常复杂。 I need to sort referenceElements depending on their position in the HTML text, ie something like我需要根据 HTML 文本中的 position 对referenceElements进行排序,即类似于

referenceElements.sort((el1,el2) -> el1.compareTo(el2)); //would return true if el1 appears BEFORE el2

Honestly I have no clue how to compare them.老实说,我不知道如何比较它们。 I only found the before method but this is for inserting and doesn't perform any kind of checks.我只找到了before方法,但这是用于插入并且不执行任何类型的检查。 For the sake of example I would expect that el1 and el2 are distinct, ie no overlapping in sense one is the child of the other.举例来说,我希望el1el2是不同的,即在意义上没有重叠,一个是另一个的孩子。

I do not have a working Java compiler here, but I think you can start figuring it out using this information:我这里没有可用的 Java 编译器,但我认为您可以使用以下信息开始弄清楚它:

  1. Before you parse, turn on tracking positions of HTML nodes: setTrackPosition https://jsoup.org/apidocs/org/jsoup/parser/Parser.html#setTrackPosition(boolean)解析之前,开启HTML节点的跟踪位置: setTrackPosition https://jsoup.org/apidocs/org/jsoup/parser/Parser.html#setTrackPosition(boolean)
  2. use the endSourceRange method of Element to get the positoin of the closing HTML tag of your element in order to compare it to the other element's closing tag position. See https://jsoup.org/apidocs/org/jsoup/nodes/Element.html#endSourceRange()使用 Element 的endSourceRange方法获取元素的结束标记 HTML 的位置,以便将其与其他元素的结束标记 position 进行比较。请参阅https://jsoup.org/apidocs/org/jsoup/nodes/Element。 html#endSourceRange()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM