[英]How can I access inside given xpath result?
I'm trying to scrape a web page like this我正在尝试抓取这样的网页
<html>
etc etc..
<div id='due'>
<h2>title</h2>
<div>
<div class='desc'>
sub1
</div>
</div>
<div>
<div class='desc'>
sub2
</div>
</div>
<div>
<div class='desc'>
subn
</div>
</div>
<h2>title2</h2>
<div>
<div class='desc'>
sub1
</div>
</div>
<div>
<div class='desc'>
sub2
</div>
</div>
<div>
<div class='desc'>
subn
</div>
</div>
</div>
etc etc..
</html>
I first tried to scrape the section:我首先尝试刮取该部分:
box = tree.xpath('//*[@id="due"]/*')
then:然后:
for div in box:
print(div.tag)
It returns correctly every first tag of every element, but if:它正确返回每个元素的每个第一个标签,但如果:
for div in box:
if div.tag == 'div':
print(div.xpath('//div[@class="desc"]').text)
Make the same search n times from start document and not from every individual 'div'从起始文档而不是每个单独的“div”进行 n 次相同的搜索
I would expect:我希望:
sub1
sub2
subn
sub1
sub2
subn
It returns, list doesn't have ".text" property but if I print every list:它返回,列表没有“.text”属性,但如果我打印每个列表:
[sub1, sub2, subn, sub1, sub2, subn]
[sub1, sub2, subn, sub1, sub2, subn]
[sub1, sub2, subn, sub1, sub2, subn]
[sub1, sub2, subn, sub1, sub2, subn]
[sub1, sub2, subn, sub1, sub2, subn]
[sub1, sub2, subn, sub1, sub2, subn]
Yep you would think that I should run once the code but I need make some variations on every iteration and create data relations, so how can I fix this?是的,您会认为我应该运行一次代码,但我需要对每次迭代进行一些更改并创建数据关系,那么我该如何解决这个问题?
Thank you in advanced提前谢谢
最后我没有用xpath解决问题,我只是搬到了bs4
For future reference, to solve your problem with xpath try this:为了将来参考,要使用 xpath 解决您的问题,请尝试以下操作:
import lxml.html as lh
scr = """[your html above]"""
doc = lh.fromstring(scr)
for t in doc.xpath('//div[@id="due"]//div[@class="desc"]/text()'):
print(t.strip())
Output:输出:
sub1
sub2
subn
sub1
sub2
subn
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.