如何訪問給定的 xpath 結果？

Question

我正在嘗試抓取這樣的網頁

<html>
etc etc..
<div id='due'>
    <h2>title</h2>
    <div>
        <div class='desc'>
           sub1
        </div>
    </div>
    <div>
        <div class='desc'>
           sub2
        </div>
    </div>
    <div>
        <div class='desc'>
           subn
        </div>
    </div>
    <h2>title2</h2>
    <div>
        <div class='desc'>
           sub1
        </div>
    </div>
    <div>
        <div class='desc'>
           sub2
        </div>
    </div>
    <div>
        <div class='desc'>
           subn
        </div>
    </div>
</div>
etc etc..
</html>

我首先嘗試刮取該部分：

box = tree.xpath('//*[@id="due"]/*')

然后：

for div in box:
    print(div.tag)

它正確返回每個元素的每個第一個標簽，但如果：

for div in box:
    if div.tag == 'div':
        print(div.xpath('//div[@class="desc"]').text)

從起始文檔而不是每個單獨的“div”進行 n 次相同的搜索

我希望：

sub1
sub2
subn
sub1
sub2
subn

它返回，列表沒有“.text”屬性，但如果我打印每個列表：

[sub1, sub2, subn, sub1, sub2, subn]
[sub1, sub2, subn, sub1, sub2, subn]
[sub1, sub2, subn, sub1, sub2, subn]
[sub1, sub2, subn, sub1, sub2, subn]
[sub1, sub2, subn, sub1, sub2, subn]
[sub1, sub2, subn, sub1, sub2, subn]

是的，您會認為我應該運行一次代碼，但我需要對每次迭代進行一些更改並創建數據關系，那么我該如何解決這個問題？

提前謝謝

Answer 1

最后我沒有用xpath解決問題，我只是搬到了bs4

Answer 2

為了將來參考，要使用 xpath 解決您的問題，請嘗試以下操作：

import lxml.html as lh
scr = """[your html above]"""
doc = lh.fromstring(scr)
for t in doc.xpath('//div[@id="due"]//div[@class="desc"]/text()'):
    print(t.strip())

輸出：

sub1
sub2
subn
sub1
sub2
subn

如何訪問給定的 xpath 結果？

問題描述

2 個解決方案

解決方案1
0 已采納 2020-11-17 22:23:15

解決方案2
0 2020-11-18 13:45:11

如何訪問給定的 xpath 結果？

問題描述

2 個解決方案

解決方案1 0 已采納 2020-11-17 22:23:15

解決方案2 0 2020-11-18 13:45:11

解決方案1
0 已采納 2020-11-17 22:23:15

解決方案2
0 2020-11-18 13:45:11