简体   繁体   English

深入的Jsoup选择元素(父级的DOM级别)

[英]Jsoup select element at depth (DOM level from parents)

Here's some raw HTML (taken from a large file): 以下是一些原始HTML (摘自大文件):

<h1 class="contentHeader">This is the header</h1>

Using JSoup's traverse method, I'm gone through the DOM and located this element, along with it's attributes, which is: 使用JSoup的traverse方法,我traverse了DOM并找到了该元素及其属性,即:

doc.traverse(new NodeVisitor() {

            @Override
            public void head(Node node, int depth) {
                    System.out.println(node);
                    System.out.println("Node depth: " + depth);
                    Attributes attrList = node.attributes();
                    for (Attribute attr: attrList) {
                        System.out.println(attr);
                    }
....
}

This produces: 这将产生:

<h1 class="contentHeader">This is the header</h1>
Node depth: 8
class="contentHeader"

What I'm now trying to do is to write a single line implementation for finding this element. 我现在想做的是编写一个单行实现来查找此元素。 I've been reading through the JSoup Cookbook and it seems that it should be possible by using the eq selector to specify a depth, but I'm having no luck. 我一直在阅读《 JSoup Cookbook》 ,似乎可以通过使用eq选择器指定深度来实现,但是我没有运气。 The best I can come up with is this: 我能想到的最好的方法是:

System.out.println(doc.select("h1.contentHeader:eq(8)"));   

But this outputs no data. 但这不会输出任何数据。 I'm either missing something crucial, misunderstanding the API, or just being plain wrong. 我或者缺少一些关键的东西,误解了API,或者只是犯了错误。

Any input or advice would be greatly appreciated. 任何意见或建议将不胜感激。

eq is a CSS's pseudo class/selector and it is not used to select by depth. eq是CSS的伪类/选择器,不用于按深度选择。 Here is the proper explanation about what eq does : 这是关于eq的正确解释:

The index-related selectors ( :eq() , :lt() , :gt() , :even , :odd ) filter the set of elements that have matched the expressions that precede them. 与索引相关的选择器( :eq() :lt() :gt() :odd :even:odd )过滤与前面的表达式匹配的元素集。 They narrow the set down based on the order of the elements within this matched set. 他们根据此匹配集中的元素顺序缩小范围。 For example, if elements are first selected with a class selector ( .myclass ) and four elements are returned, these elements are given indices 0 through 3 for the purposes of these selectors. 例如,如果首先使用类选择器( .myclass )选择元素并返回四个元素,则出于这些选择器的目的,这些元素的索引为03

Note that since JavaScript arrays use 0-based indexing , these selectors reflect that fact. 请注意,由于JavaScript数组使用基于0的索引 ,因此这些选择器反映了这一事实。 This is why $( ".myclass:eq(1)") selects the second element in the document with the class myclass , rather than the first. 这就是为什么$( ".myclass:eq(1)")选择类为myclass的文档中的第二个元素,而不是第一个元素。 In contrast, :nth-child(n) uses 1-based indexing to conform to the CSS specification. 相反, :nth-child(n)使用基于1的索引来符合CSS规范。

So, eq is not about depth. 因此, eq与深度无关。

But, if your HTML have a class attribute, why not use it: 但是,如果您的HTML具有class属性,为什么不使用它:

System.out.println(doc.select("h1.contentHeader"));

You can also write an extremely descendant selector for this node (it is just an example, since I don't know your HTML structure): 您还可以为此节点编写一个极其后代的选择器 (这只是一个示例,因为我不知道您的HTML结构):

System.out.println(doc.select("body div .someClass div div h1.contentHeader"));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM