简体   繁体   English

如何访问给定的 xpath 结果?

[英]How can I access inside given xpath result?

I'm trying to scrape a web page like this我正在尝试抓取这样的网页

<html>
etc etc..
<div id='due'>
    <h2>title</h2>
    <div>
        <div class='desc'>
           sub1
        </div>
    </div>
    <div>
        <div class='desc'>
           sub2
        </div>
    </div>
    <div>
        <div class='desc'>
           subn
        </div>
    </div>
    <h2>title2</h2>
    <div>
        <div class='desc'>
           sub1
        </div>
    </div>
    <div>
        <div class='desc'>
           sub2
        </div>
    </div>
    <div>
        <div class='desc'>
           subn
        </div>
    </div>
</div>
etc etc..
</html>

I first tried to scrape the section:我首先尝试刮取该部分:

box = tree.xpath('//*[@id="due"]/*')

then:然后:

for div in box:
    print(div.tag)

It returns correctly every first tag of every element, but if:它正确返回每个元素的每个第一个标签,但如果:

for div in box:
    if div.tag == 'div':
        print(div.xpath('//div[@class="desc"]').text)

Make the same search n times from start document and not from every individual 'div'从起始文档而不是每个单独的“div”进行 n 次相同的搜索

I would expect:我希望:

sub1
sub2
subn
sub1
sub2
subn

It returns, list doesn't have ".text" property but if I print every list:它返回,列表没有“.text”属性,但如果我打印每个列表:

[sub1, sub2, subn, sub1, sub2, subn]
[sub1, sub2, subn, sub1, sub2, subn]
[sub1, sub2, subn, sub1, sub2, subn]
[sub1, sub2, subn, sub1, sub2, subn]
[sub1, sub2, subn, sub1, sub2, subn]
[sub1, sub2, subn, sub1, sub2, subn]

Yep you would think that I should run once the code but I need make some variations on every iteration and create data relations, so how can I fix this?是的,您会认为我应该运行一次代码,但我需要对每次迭代进行一些更改并创建数据关系,那么我该如何解决这个问题?

Thank you in advanced提前谢谢

最后我没有用xpath解决问题,我只是搬到了bs4

For future reference, to solve your problem with xpath try this:为了将来参考,要使用 xpath 解决您的问题,请尝试以下操作:

import lxml.html as lh
scr = """[your html above]"""
doc = lh.fromstring(scr)
for t in doc.xpath('//div[@id="due"]//div[@class="desc"]/text()'):
    print(t.strip())

Output:输出:

sub1
sub2
subn
sub1
sub2
subn

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何访问Vertex对象内的邻接表? - How can I access to adjacency lists inside the Vertex object? 如何使用 Selenium 访问 Javascript 内部的元素? - How can I access an element inside of a Javascript with Selenium? 如何将多个字典添加到主字典中的键? 下面给出示例 - How can I add multiple dictionaries to a key inside a main dictionary? Example is given below lxml 相对 xPath 不返回相对于给定 HtmlElement 的结果 - lxml relative xPath doesn't return result relative to the given HtmlElement 如何访问Gathering Future完成结果中的数据 - How to access the data inside of Gathering Future finished result 如何将 XPath 作为变量存储在 Python Selenium 中? - How can I store an XPath as a variable in Python Selenium? 如何使用相同的 xpath 单击 Python Selenium 中的多个项目? - How can I click multiple items in Python Selenium with the same xpath? 如果 xpath 发生了某种变化,我该如何单击按钮 - how can i click the button if the xpath has changed somehow 如何调用函数内的函数? 我可以访问这些功能,还是像“辅助方法”一样工作? - How are functions inside functions called? And can I access those functions or they work like “helper methods”? 如何从 Python pod 内部访问 kube-apiserver? - How can I access the kube-apiserver from inside a Python pod?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM