簡體   English   中英

在特定文本之后選擇和修改xpath節點

[英]Select and modify xpath nodes after specific text

在此處輸入圖片說明

我使用以下代碼獲取所有名稱:

def parse_authors(self, root): 
    author_nodes = root.xpath('//a[@class="booklink"][contains(@href,"/author/")]/text()')
    if author_nodes:
        return [unicode(author) for author in author_nodes]

但是我想如果有翻譯者在他們的名字旁邊添加“(翻譯)”:

example:translator1(translation)

您可以使用那個translation: 文本節點將作者與翻譯者區分開-作者是“ translation:”文本節點的同級兄弟,譯者是同級兄弟。

作者:

//text()[contains(., 'translation:')]/preceding-sibling::a[@class='booklink' and contains(@href, '/author/')]/text()

譯員:

//text()[contains(., 'translation:')]/following-sibling::a[@class='booklink' and contains(@href, '/author/')]/text()

工作示例代碼:

from lxml.html import fromstring

data = """
<td>
    <a class="booklink" href="/author/43710/Author 1">Author 1</a>
    ,
     <a class="booklink" href="/author/46907/Author 2">Author 2</a>
     <br>
     translation:
     <a class="booklink" href="/author/47669/translator 1">Translator 1</a>
     ,
     <a class="booklink" href="/author/9382/translator 2">Translator 2</a>
</td>"""

root = fromstring(data)

authors = root.xpath("//text()[contains(., 'translation:')]/preceding-sibling::a[@class='booklink' and contains(@href, '/author/')]/text()")
translators = root.xpath("//text()[contains(., 'translation:')]/following-sibling::a[@class='booklink' and contains(@href, '/author/')]/text()")

print(authors)
print(translators)

印刷品:

['Author 1', 'Author 2']
['Translator 1', 'Translator 2']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM